Coder Social home page Coder Social logo

Canvas usage for mouse about canvas HOT 8 CLOSED

illumina avatar illumina commented on August 24, 2024
Canvas usage for mouse

from canvas.

Comments (8)

StephenTanner avatar StephenTanner commented on August 24, 2024 1

I ran this on Windows and it completed successfully. I'm re-running under Mono now, with atop logging, to see whether I can reproduce the issue there.

from canvas.

eroller avatar eroller commented on August 24, 2024

Canvas should work for mm9, although we only provide the required reference genome data for human so you will need to generate some of the inputs using the mm9 fasta file. My comments in red:

1- I guess the “Tumor-normal-enrichment” workflow should be used. Right?
Technically T/N is only appropriate for matched tumor and normal samples (i.e. from the same individual). If the WT is not from the same individual mouse as the cell line then the more appropriate workflow would be Somatic Enrichment which uses a control sample rather than a match normal sample. In the end I believe these two workflows will do the same thing since you only have one control sample, but technically it is more correct to run the Somatic Enrichment mode.

2- What should be the --manifest= option in this case? The same as in the github demo?
The manifest in the github demo is for human exome so would not be appropriate for mouse. You can look at the format from the demo and modify it to include the exome regions that you targeted in the mouse genome during sequencing.

3- Is providing a filter (-f) required? I have no specifics here for my experiement. Any recommendations?
The filter bed is required but you can provide an empty file here. We use it to exclude regions that have no interesting CNVs. We probably should have made this parameter optional, but for human we always use a filter bed file to exclude centromeres from the analysis, mainly to reduce run time.

4- “--b-allele-vcf” I did not get what should I fill in for this parameter.
Canvas needs to know which sites to expect heterozygous SNVs for the sample so we can estimate how different this sample is from the expected CN=2 reference genome which should have 0.5 allele frequency. You should be able use mouse dbsnp vcf for this, but make sure to also pass the parameter --exclude-non-het-b-allele-sites since all these sites are not guaranteed to be heterozygous in your specific sample.

5- What kind of custom parameters should I add?
Ideally you won't need to use any custom parameters, but if you are interested, after running without custom parameters you can view the CanvasLog.txt file to see which components were launched. Each component has its own list of command line parameters that you may wish to modify. Probably the most common custom parameter is to use custom bins. You may want to convert each targeted region into a separate bin. That would give you the best signal to noise in your copy number calling.

6- The “-m” parameter is found in the demo but not in the actual help of CANVAS? What is its usage?
The -m option you see is actually part of the --custom-parameters option. Basically this option:
--custom-parameters=CanvasBin,-m=TruncatedDynamicRange
is saying, when running the CanvasBin step, use custom parameter -m=TruncatedDynamicRange which will modify the binning mode. You can see all the options to CanvasBin by running mono /illumina/sync/software/unofficial/Canvas/latest/CanvasBin.exe

from canvas.

eroller avatar eroller commented on August 24, 2024

Starting from the first step, I tried using “FlagUniqueKmers.exe“ to convert the mm9 fasta to the input fasta needed by CANVAS, I get an exception.

The command line I used is:

“/home/code/mono-4.0.2/bin/mono /home/code/canvas/Canvas-1.11.0_x64/1.11.0/Tools/FlagUniqueKmers/FlagUniqueKmers.exe mm9.fa mm9_kmer.fa”

The program runs for some time displaying some lines e.g.,:


6/28/2016 2:36:46 PM Start
Load FASTA file at /kitty/data/mouse/mm9_ref/mm9.fa, write kmer-flagged output to /home/asahyoun/Projects/4T1_CNV/CANVAS/mm9_kmer.fa

1 chr1 0 dict 0 incomplete 0
1 chr1 1000000 dict 0 incomplete 0
1 chr1 2000000 dict 0 incomplete 0
1 chr1 3000000 dict 0 incomplete 0
1 chr1 4000000 dict 947869 incomplete 0

...

Then, I get the following error:

Unhandled Exception:
System.NullReferenceException: Object reference not set to an instance of an object
at System.Collections.Generic.InternalStringComparer.Equals (System.String x, System.String y) [0x00000] in :0
at System.Collections.Generic.Dictionary2[System.String,System.Int64].FindEntry (System.String key) [0x00000] in <filename unknown>:0 at System.Collections.Generic.Dictionary2[System.String,System.Int64].ContainsKey (System.String key) [0x00000] in :0
at FlagUniqueKmers.KmerChecker.ProcessOneChromosome (GenericRead fastaEntry, Int32 chromosomeIndex) [0x00000] in :0
at FlagUniqueKmers.KmerChecker.Main (System.String fastaPath, System.String outputPath) [0x00000] in :0
at FlagUniqueKmers.Program.Main (System.String[] args) [0x00000] in :0
[ERROR] FATAL UNHANDLED EXCEPTION: System.NullReferenceException: Object reference not set to an instance of an object
at System.Collections.Generic.InternalStringComparer.Equals (System.String x, System.String y) [0x00000] in :0
at System.Collections.Generic.Dictionary2[System.String,System.Int64].FindEntry (System.String key) [0x00000] in <filename unknown>:0 at System.Collections.Generic.Dictionary2[System.String,System.Int64].ContainsKey (System.String key) [0x00000] in :0
at FlagUniqueKmers.KmerChecker.ProcessOneChromosome (GenericRead fastaEntry, Int32 chromosomeIndex) [0x00000] in :0
at FlagUniqueKmers.KmerChecker.Main (System.String fastaPath, System.String outputPath) [0x00000] in :0
at FlagUniqueKmers.Program.Main (System.String[] args) [0x00000] in :0

The environment where I execute this has the following information:
OS: Suse Linux Enterprise 11 – Service pack 2
Mono version 4.0.2 (Compiled with gcc 5.2)

from canvas.

eroller avatar eroller commented on August 24, 2024

I have FlagUniqueKmers running using mono 4.2.3. It seems you are running into a bug with earlier versions of mono. Note that the FlagUniqueKmers process will take a long time (several days or a week) to complete.

from canvas.

eroller avatar eroller commented on August 24, 2024

FlagUniqueKmers running using mono 4.2.3 and mono 4.4.1 eventually fails with:

Unhandled Exception:
System.NullReferenceException: Object reference not set to an instance of an object
at System.Collections.Generic.GenericEqualityComparer1[T].Equals (System.Collections.Generic.T x, System.Collections.Generic.T y) <0x2aaab31b1540 + 0x0004d> in <filename unknown>:0 at System.Collections.Generic.Dictionary2[TKey,TValue].FindEntry (System.Collections.Generic.TKey key) <0x4003ede0 + 0x000f3> in :0
at System.Collections.Generic.Dictionary`2[TKey,TValue].ContainsKey (System.Collections.Generic.TKey key) <0x4003eda0 + 0x00019> in :0
at FlagUniqueKmers.KmerChecker.ProcessOneChromosome (GenericRead fastaEntry, Int32 chromosomeIndex) <0x4003df60 + 0x00540> in :0
at FlagUniqueKmers.KmerChecker.Main (System.String fastaPath, System.String outputPath) <0x40013f90 + 0x0025f> in :0
at FlagUniqueKmers.Program.Main (System.String[] args) <0x40012d50 + 0x000bf> in :0

from canvas.

mdshw5 avatar mdshw5 commented on August 24, 2024

@StephenTanner Are you still running under Mono? I tried using Canvas under Mono a few months ago, but ran into so many stack traces that I gave up. I'm attaching my manifest creation code in case anyone finds it useful:

## This is a rule for use in Snakemake
rule create_canvas_xml:                                                                                                                                                                       
    input: fasta=config["mouse_fasta"]                                                                                                                                                        
    output: xml="GenomeSize.xml", genome="genome.fa"                                                                                                                                          
    params: runtime="7200", memory="2G"                                                                                                                                                       
    run:                                                                                                                                                                                      
        from pyfaidx import Fasta                                                                                                                                                             
        from collections import Counter                                                                                                                                                       
        import hashlib                                                                                                                                                                        
        import os                                                                                                                                                                             

        with open(input.fasta) as fasta, open(output.genome, 'w') as genome:                                                                                                                  
            for line in fasta:                                                                                                                                                                
                genome.write(line)                                                                                                                                                            

        with Fasta(input.fasta) as genome, open(output.xml, 'w') as genomesize:                                                                                                               
            genomesize.write('<sequenceSizes genomeName="{0}">\n'.format(os.path.basename(config['mouse_fasta'])))                                                                            
            for chrom in genome:                                                                                                                                                              
                # <chromosome fileName="genome.fa" contigName="chrM" totalBases="16571" isCircular="false" md5="d2ed829b8a1628d16cbeee88e88e39eb" ploidy="2" knownBases="16571" type="Mitochondria" />                                                                                                                                                                                     
                fileName = os.path.basename(input.fasta)                                                                                                                                      
                contigName = chrom.name                                                                                                                                                       
                totalBases = len(chrom)                                                                                                                                                       
                isCircular = "false"                                                                                                                                                          
                md5 = hashlib.md5(str(chrom).encode('ascii')).hexdigest()                                                                                                                     
                ploidy = "2"                                                                                                                                                                  
                counts = Counter(str(chrom).upper())                                                                                                                                          
                knownBases = str(counts['A'] + counts['T'] + counts['C'] + counts['G'])                                                                                                       
                if "M" in chrom.name:                                                                                                                                                         
                    type = "Mitochondria"                                                                                                                                                     
                elif "X" in chrom.name:                                                                                                                                                       
                    type = "Allosome"                                                                                                                                                         
                elif "Y" in chrom.name:                                                                                                                                                       
                    type = "Allosome"                                                                                                                                                         
                else:                                                                                                                                                                         
                    type = "Autosome"                                                                                                                                                         
                genomesize.write('<chromosome fileName="{fileName}" contigName="{contigName}" totalBases="{totalBases}" isCircular="false" md5="{md5}" ploidy="2" knownBases="{knownBases}" type="{type}" />\n'.format(**locals()))                                                                                                                                                        
            genomesize.write('</sequenceSizes>')    

from canvas.

StephenTanner avatar StephenTanner commented on August 24, 2024

I was able to reproduce the issue when running under Mono. I suspect that we may be able to resolve this by updating to either a new Mono release or to .NET core. For now, a workaround is to run the tool under .NET on a Windows system.

from canvas.

eroller avatar eroller commented on August 24, 2024

now that we have switch to dotnet in the latest version, FlagUniqueKmers seems to be running fine on Linux. Be sure to set the environment variable as mentioned here:
#48

from canvas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.