Comments (8)
I ran this on Windows and it completed successfully. I'm re-running under Mono now, with atop logging, to see whether I can reproduce the issue there.
from canvas.
Canvas should work for mm9, although we only provide the required reference genome data for human so you will need to generate some of the inputs using the mm9 fasta file. My comments in red:
1- I guess the “Tumor-normal-enrichment” workflow should be used. Right?
Technically T/N is only appropriate for matched tumor and normal samples (i.e. from the same individual). If the WT is not from the same individual mouse as the cell line then the more appropriate workflow would be Somatic Enrichment which uses a control sample rather than a match normal sample. In the end I believe these two workflows will do the same thing since you only have one control sample, but technically it is more correct to run the Somatic Enrichment mode.
2- What should be the --manifest= option in this case? The same as in the github demo?
The manifest in the github demo is for human exome so would not be appropriate for mouse. You can look at the format from the demo and modify it to include the exome regions that you targeted in the mouse genome during sequencing.
3- Is providing a filter (-f) required? I have no specifics here for my experiement. Any recommendations?
The filter bed is required but you can provide an empty file here. We use it to exclude regions that have no interesting CNVs. We probably should have made this parameter optional, but for human we always use a filter bed file to exclude centromeres from the analysis, mainly to reduce run time.
4- “--b-allele-vcf” I did not get what should I fill in for this parameter.
Canvas needs to know which sites to expect heterozygous SNVs for the sample so we can estimate how different this sample is from the expected CN=2 reference genome which should have 0.5 allele frequency. You should be able use mouse dbsnp vcf for this, but make sure to also pass the parameter --exclude-non-het-b-allele-sites since all these sites are not guaranteed to be heterozygous in your specific sample.
5- What kind of custom parameters should I add?
Ideally you won't need to use any custom parameters, but if you are interested, after running without custom parameters you can view the CanvasLog.txt file to see which components were launched. Each component has its own list of command line parameters that you may wish to modify. Probably the most common custom parameter is to use custom bins. You may want to convert each targeted region into a separate bin. That would give you the best signal to noise in your copy number calling.
6- The “-m” parameter is found in the demo but not in the actual help of CANVAS? What is its usage?
The -m option you see is actually part of the --custom-parameters option. Basically this option:
--custom-parameters=CanvasBin,-m=TruncatedDynamicRange
is saying, when running the CanvasBin step, use custom parameter -m=TruncatedDynamicRange which will modify the binning mode. You can see all the options to CanvasBin by running mono /illumina/sync/software/unofficial/Canvas/latest/CanvasBin.exe
from canvas.
Starting from the first step, I tried using “FlagUniqueKmers.exe“ to convert the mm9 fasta to the input fasta needed by CANVAS, I get an exception.
The command line I used is:
“/home/code/mono-4.0.2/bin/mono /home/code/canvas/Canvas-1.11.0_x64/1.11.0/Tools/FlagUniqueKmers/FlagUniqueKmers.exe mm9.fa mm9_kmer.fa”
The program runs for some time displaying some lines e.g.,:
“
6/28/2016 2:36:46 PM Start
Load FASTA file at /kitty/data/mouse/mm9_ref/mm9.fa, write kmer-flagged output to /home/asahyoun/Projects/4T1_CNV/CANVAS/mm9_kmer.fa
1 chr1 0 dict 0 incomplete 0
1 chr1 1000000 dict 0 incomplete 0
1 chr1 2000000 dict 0 incomplete 0
1 chr1 3000000 dict 0 incomplete 0
1 chr1 4000000 dict 947869 incomplete 0
…
...
“
Then, I get the following error:
Unhandled Exception:
System.NullReferenceException: Object reference not set to an instance of an object
at System.Collections.Generic.InternalStringComparer.Equals (System.String x, System.String y) [0x00000] in :0
at System.Collections.Generic.Dictionary2[System.String,System.Int64].FindEntry (System.String key) [0x00000] in <filename unknown>:0 at System.Collections.Generic.Dictionary
2[System.String,System.Int64].ContainsKey (System.String key) [0x00000] in :0
at FlagUniqueKmers.KmerChecker.ProcessOneChromosome (GenericRead fastaEntry, Int32 chromosomeIndex) [0x00000] in :0
at FlagUniqueKmers.KmerChecker.Main (System.String fastaPath, System.String outputPath) [0x00000] in :0
at FlagUniqueKmers.Program.Main (System.String[] args) [0x00000] in :0
[ERROR] FATAL UNHANDLED EXCEPTION: System.NullReferenceException: Object reference not set to an instance of an object
at System.Collections.Generic.InternalStringComparer.Equals (System.String x, System.String y) [0x00000] in :0
at System.Collections.Generic.Dictionary2[System.String,System.Int64].FindEntry (System.String key) [0x00000] in <filename unknown>:0 at System.Collections.Generic.Dictionary
2[System.String,System.Int64].ContainsKey (System.String key) [0x00000] in :0
at FlagUniqueKmers.KmerChecker.ProcessOneChromosome (GenericRead fastaEntry, Int32 chromosomeIndex) [0x00000] in :0
at FlagUniqueKmers.KmerChecker.Main (System.String fastaPath, System.String outputPath) [0x00000] in :0
at FlagUniqueKmers.Program.Main (System.String[] args) [0x00000] in :0
The environment where I execute this has the following information:
OS: Suse Linux Enterprise 11 – Service pack 2
Mono version 4.0.2 (Compiled with gcc 5.2)
from canvas.
I have FlagUniqueKmers running using mono 4.2.3. It seems you are running into a bug with earlier versions of mono. Note that the FlagUniqueKmers process will take a long time (several days or a week) to complete.
from canvas.
FlagUniqueKmers running using mono 4.2.3 and mono 4.4.1 eventually fails with:
Unhandled Exception:
System.NullReferenceException: Object reference not set to an instance of an object
at System.Collections.Generic.GenericEqualityComparer1[T].Equals (System.Collections.Generic.T x, System.Collections.Generic.T y) <0x2aaab31b1540 + 0x0004d> in <filename unknown>:0 at System.Collections.Generic.Dictionary
2[TKey,TValue].FindEntry (System.Collections.Generic.TKey key) <0x4003ede0 + 0x000f3> in :0
at System.Collections.Generic.Dictionary`2[TKey,TValue].ContainsKey (System.Collections.Generic.TKey key) <0x4003eda0 + 0x00019> in :0
at FlagUniqueKmers.KmerChecker.ProcessOneChromosome (GenericRead fastaEntry, Int32 chromosomeIndex) <0x4003df60 + 0x00540> in :0
at FlagUniqueKmers.KmerChecker.Main (System.String fastaPath, System.String outputPath) <0x40013f90 + 0x0025f> in :0
at FlagUniqueKmers.Program.Main (System.String[] args) <0x40012d50 + 0x000bf> in :0
from canvas.
@StephenTanner Are you still running under Mono? I tried using Canvas under Mono a few months ago, but ran into so many stack traces that I gave up. I'm attaching my manifest creation code in case anyone finds it useful:
## This is a rule for use in Snakemake
rule create_canvas_xml:
input: fasta=config["mouse_fasta"]
output: xml="GenomeSize.xml", genome="genome.fa"
params: runtime="7200", memory="2G"
run:
from pyfaidx import Fasta
from collections import Counter
import hashlib
import os
with open(input.fasta) as fasta, open(output.genome, 'w') as genome:
for line in fasta:
genome.write(line)
with Fasta(input.fasta) as genome, open(output.xml, 'w') as genomesize:
genomesize.write('<sequenceSizes genomeName="{0}">\n'.format(os.path.basename(config['mouse_fasta'])))
for chrom in genome:
# <chromosome fileName="genome.fa" contigName="chrM" totalBases="16571" isCircular="false" md5="d2ed829b8a1628d16cbeee88e88e39eb" ploidy="2" knownBases="16571" type="Mitochondria" />
fileName = os.path.basename(input.fasta)
contigName = chrom.name
totalBases = len(chrom)
isCircular = "false"
md5 = hashlib.md5(str(chrom).encode('ascii')).hexdigest()
ploidy = "2"
counts = Counter(str(chrom).upper())
knownBases = str(counts['A'] + counts['T'] + counts['C'] + counts['G'])
if "M" in chrom.name:
type = "Mitochondria"
elif "X" in chrom.name:
type = "Allosome"
elif "Y" in chrom.name:
type = "Allosome"
else:
type = "Autosome"
genomesize.write('<chromosome fileName="{fileName}" contigName="{contigName}" totalBases="{totalBases}" isCircular="false" md5="{md5}" ploidy="2" knownBases="{knownBases}" type="{type}" />\n'.format(**locals()))
genomesize.write('</sequenceSizes>')
from canvas.
I was able to reproduce the issue when running under Mono. I suspect that we may be able to resolve this by updating to either a new Mono release or to .NET core. For now, a workaround is to run the tool under .NET on a Windows system.
from canvas.
now that we have switch to dotnet in the latest version, FlagUniqueKmers seems to be running fine on Linux. Be sure to set the environment variable as mentioned here:
#48
from canvas.
Related Issues (20)
- Germline-WGS fails with binned failed HOT 3
- vcf header does not include CN1 HOT 2
- Canvas 1.40 is looking for libpng12.so.0
- help to interpret CN value for calls with <DUP> alternative allele HOT 2
- /canis_familiaris.vcf' should contain one genotypes column corresponding to sample HOT 3
- Unable to download reference HOT 1
- The specified vcf file () does not exist HOT 6
- System.ApplicationException: ERROR: Expected to read 27029 bytes from the block header, but only read 20537 bytes. HOT 1
- filter13 bed file not found error HOT 1
- Include linkes to publications in the README HOT 1
- Germline-WGS:ERROR: Either option sample-b-allele-vcf or option population-b-allele-vcf must be specified HOT 1
- Commandline parsing error HOT 3
- Canvas:REF HOT 2
- ERROR: Exception caught in WorkDoerFactory HOT 22
- Canvas vcf to plink.cnv output HOT 2
- ERROR: An attempt was made to load two reference sequences with the same exact names (1)
- System.ComponentModel.Win32Exception (0x80004005): HOT 2
- Unexpected exception "The specified vcf file () does not exist." when running Germline-WGS mode on Linux
- Amazon bucket for reference files not available HOT 2
- bedGraphToBigWig sort error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from canvas.