Coder Social home page Coder Social logo

clonehd's People

Contributors

andrej-fischer avatar juliangehring avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clonehd's Issues

cnv normalization error

Great job.
But Sth wrong with filterHD. When I try to use filterHD to get the bias estimation, it says
Initial range is 1.010e+00 < x < 3.132e+03
eval jump sigma -llh
ERROR
Aborted (core dumped)
And no more information about this error. Then I think maybe sth wrong with my input. So I tested it with head -n. The funny thing is that, the program will prompt error at some line. But those lines looks good great than 0. Even if I removed all the 0s or change 0 to 1. The program still crashed. But when I split the segments into tiny segments and do the analysis, it sometimes works. So I really don't what's the problem.

Regards,

John

how to create the baf input file

i noticed that the fourth columns of the baf input file are the same number in the right position of third columns of the tumor.cna input file. Could anybody tell me how to create the baf input file? Please, i have got a lot of pressure from my boss on this issue.

FilterHD error

filterHD is returning an error when I attempt to run the bias field analysis. I'm using the run-example.sh on TCGA derived data. Below the terminal output and code segment is pasted:
Code:
`

The tumor read depth is now analysed with the bias field from the matched normal. The diffusion constant is set to zero. If left free, it should converge to a very small value. The jump rate could be slightly higher. The LLH should be higher than for the run above indicating the presence of the bias field. Now we are interested in the jumps.
cmd="$filterHD --data $tumorCNA --mode 3 --pre ${results}/tumor.cna.bias --bias $bias --jumps 1"
echo $cmd
$cmd
echo

`
Terminal:
filterHD: Fitting jump-diffusion model to data at 129 loci in 23 segment(s) in 1 sample(s) with a poisson emission model:
ERROR 1 in get_bias()
24 2792295 1.70e+02 9.81e+01 0.00e+00

./build/filterHD --data ./cloneHDdata/tumor.baf.txt --mode 1 --pre ./cloneHDresults/tumor.baf --jumps 1 --reflect 1 --dist 1

cloneHD error

Trying to run cloneHD using my files generated from filterHD step. Running into a error without much explanation:

./build/cloneHD --cna tumor.exome.cloneHD.noY --chr normal.exome.cloneHD.noY --baf tumor.baf --pre output/tumor --bias normal.cna.posterior-1.txt --seed 123 --trials 2 --nmax 3 --force --max-tcn 4 --cna-jumps tumor.baf.jumps.txt --min-jump 0.01 --restarts 10 --mass-gauging 1

cloneHD: probabilistic inference of sub-clonality using...

CNA data in tumor.exome.cloneHD.noY: 193717 sites in 23 chr across 1 samples
BAF data in tumor.baf: 31724 sites in 23 chr across 1 samples

Aborted (core dumped)

Where to go from here?

Segmentation Fault

Running the example test set I run into a seg fault:

./build/filterHD --data ./test/data//normal.cna.txt --mode 3 --pre ./test/results//normal.cna

filterHD: Fitting jump-diffusion model to data at 20001 loci in 1 segment(s) in 1 sample(s) with a poisson emission model:

Filtering sample 1 of 1:
Initial range is 3.801e+01 < x < 8.500e+01
eval jump sigma rnd -llh
10 1.71212e-06 7.74107e-04 3.80764e-05 6.9367034070e+04
20 3.49225e-08 7.03853e-04 1.87274e-04 6.9362282092e+04
30 1.98473e-09 7.61654e-04 3.09235e-04 6.9361873946e+04
40 1.07903e-09 7.62688e-04 7.27041e-05 6.9361621633e+04
Adapted range to 5.493e+01 < x < 6.526e+01
10 1.32967e-09 7.25762e-04 2.95731e-05 6.9359987522e+04
20 1.05477e-09 7.33552e-04 1.30669e-05 6.9359948984e+04
30 1.03791e-09 7.37987e-04 9.48939e-07 6.9359941037e+04
Segmentation fault (core dumped)

Docs unclear wrt CNA file spec

It is unclear from the description of CNA files whether the input read counts are of those overlapping the given position, overlapping the segment ending at the given position, or contained within the segment ending at the given position.

Location of test data?

run-example.sh uses a test data set which seems like it should be included in the repo, but isn't there. Any ideas?

provide description of baf.subclone-1.txt file

A description of the baf.subclone-1.txt files would be helpful. I assume they are similar to cna.subclone-2.txt, with the caveat that it is showing the uncertainty in which allele has x copies as a 50/50 weighting on the two copies.

#chr first-locus nloci last-locus 0 1 2 3 4
20  61098  56814 62962891 0.500 0.500 0.000 0.000 0.000
21 9427957  29809 48102613 0.500 0.500 0.000 0.000 0.000
22 16052239  31219 51229855 0.500 0.500 0.000 0.000 0.000
1  52238  13224 15436352 0.500 0.000 0.500 0.000 0.000
1 15454068     19 15463776 0.500 0.000 0.500 0.000 0.000
1 15467250     50 15506001 0.500 0.000 0.500 0.000 0.000
1 15512968     30 15536491 0.500 0.000 0.500 0.000 0.000
1 15539582     56 15560827 0.500 0.000 0.500 0.000 0.000
1 15562922      3 15564121 0.500 0.000 0.500 0.000 0.000

how to run CloneHD for WES data

Hi,

I have WES matched tumor-normal data and I am looking to run CloneHD. I read that this tool will also work for WES data and wondering where I can get more information on this?

Thanks

Maximum total copy number

It seems to me that I need to set a limit for maximum total copy number in order to reduce computation time. The documentation says that if it is not specified, normal copy number is used. That makes me confused about what maximum copy number stands for. Is it the copy number for the matched normal? If it were the maximum copy number for the tumor sample, why would normal copy be the default limit?

Also, I am able to set the limit separately for each chromosome, which is great. However, for that, do I need to know the subclonal composition of my samples beforehand? I actually have multiple samples, so I was hoping to set maximum copy numbers per chromosome per sample.

Bug with different number of chromosomes in SNV, mean_tcn and avail_cn files

There is a bug when running cloneHD in SNV mode with --snv ${snv}, --avail-cn ${avail_cn} and --mean-tcn ${mean_tcn}. If the number of chromosomes in ${avail_cn} is greater than the number of chromosomes in ${snv}, cloneHD exits with a segmentation fault. @vmustonen has tested this for get_avail_cn in cloneHD-functions.cpp, but it may also affect get_mean_tcn.

The problem only seems to happen if the last chr in ${avail_cn} does not have any SNV hits. One can skip intermediate chrs without a problem so this seems to be a boundary case that has not occurred. This corner case is very rare with whole-genome data, but it is likely to cause problem for exome data.

We should build a safe guard, such that the chr count in ${avail_cn} should only run to max(SNVs chr). A solution would be to set max(SNVs chr)== max(avail-cn chr).

Normal copy number error

Trying to do a basic run using run_example.sh as a template on whole exome data. Got as far as running cloneHD but was met with this error:

"Normal copy number for some BAF data chromosomes is not 2."

Code inspection shows some sort of calculation for normal copy number from baf data? Not really sure how to proceed from here.

filterHD problem

Great job.
But Sth wrong with filterHD. When I try to use filterHD to get the bias estimation, it says
Initial range is 1.010e+00 < x < 3.132e+03
eval jump sigma -llh
ERROR
Aborted (core dumped)
And no more information about this error. Then I think maybe sth wrong with my input. So I tested it with head -n. The funny thing is that, the program will prompt error at some line. But those lines looks good great than 0. Even if I removed all the 0s or change 0 to 1. The program still crashed. But when I split the segments into tiny segments and do the analysis, it sometimes works. So I really don't what's the problem.

Regards,

John

Inconsistencies between CNA and BAF files

From the following I would expect the copy number between 122701000 and 122702000 to be 0, since 122702000 is the end of the first segment for the predicted region on the 3rd line. In other words, the 3rd line represents a region from 122701000 to 135534000.

tumour.cna.subclone-2.txt

10 42027000  80660 122686000 0.000 0.000 0.000 1.000 0.000
10 122687000     15 122701000 0.000 0.000 0.000 1.000 0.000
10 122702000  12833 135534000 1.000 0.000 0.000 0.000 0.000
10 135535000      1 135535000 1.000 0.000 0.000 0.000 0.000

However from the BAF file, some SNPs within the 122701000-122702000 segment have copy numbers 1 or 2 which is incompatible with total copies 0.

tumour.baf.subclone-2.txt

10 122701679      2 122701957 0.080 0.420 0.420 0.080 0.000

set fault on cloneHD snv mode with single-variant chroms

I have exome-only variants, so have some chromosomes with only 1 somatic variant. I recompiled with your latest bug fixes and still appear to be getting the seg fault. When I remove the single-variant chromosomes, no seg fault. Your help is appreciated, and we look forward to using this on more samples.

bug 001

seg fault with very sparse data (one data point in a chr)

Provide a general makefile

The provided makefiles have hardcoded paths to library files, and specify compilers that may not be available on every system. A makefile that doesnt hardcode such paths, and allows users to specify such options using environment variables would be helpful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.