Coder Social home page Coder Social logo

ding-lab / music2 Goto Github PK

View Code? Open in Web Editor NEW
59.0 22.0 20.0 45.33 MB

identifying mutational significance in cancer genomes

License: MIT License

Perl 90.56% R 6.35% Python 3.09%
smg mutation-relation clinical-correlation cancer-genomics pfam-annotation

music2's Introduction

MuSiC2

Mutational Significance in Cancer (Cancer Mutation Analysis) version 2.

Usage

Program:     music2 - Mutational Significance in Cancer (Cancer Mutation Analysis) version 2.
Version:     V0.2
Author:      Beifang Niu && Matthew Wyczalkowski

Usage:  music2 <command> [options]

Key commands:

bmr                    ...  Calculate gene coverages and background mutation rates.
smg                         Identify significantly mutated genes.
long-gene-filter            Find conditions for which significance status is no longer related to gene size. 
survival                    Create survival plots and P-values for clinical and mutational phenotypes.  
clinical-correlation        Correlate phenotypic traits against mutated genes, or against individual variants.
cosmic                      Match a list of variants to those in COSMIC, and highlight druggable targets.
cosmic-omim                 Compare the amino acid changes of supplied mutations to COSMIC and OMIM databases.
dendrix                     Discovery of mutated driver pathways in cancer using only mutation data. 
dendri-permutation     ...  Run the permutation test for Dendrix. 
mutation-relation           Identify relationships of mutation concurrency or mutual exclusivity in genes across cases.
path-scan                   Find signifcantly mutated pathways in a cohort given a list of somatic mutations.
pfam                        Add Pfam annotation to a MAF file.
proximity                   Perform a proximity analysis on a list of mutations.
proximity-window            Perform a sliding window proximity analysis on a list of mutations.

help      this message

Install (Ubuntu & CentOS)

Note: We provided binaries for joinx, samtools, calcRoiCovg and bedtools in /bin dir, and which were compiled on CentOS, and tested on CentOS/Ubuntu.

Prerequisites for Ubuntu:

    sudo apt-get install build-essential \
    git \
    cmake \
    curl \
    cpanminus
    libbz2-dev \
    libgtest-dev \
    libbam-dev \
    zlib1g-dev 

Prerequisites for CentOS:

    sudo yum install yum-utils
    sudo yum install curl
    sudo yum install git
    sudo yum install cmake
    sudo yum groupinstall "Development Tools"
    sudo yum update -y nss curl libcurl
    sudo yum install perl-devel
    sudo yum install perl-CPAN
    sudo yum install bzip2-libs
    sudo yum install zlib-devel
    sudo curl -L http://cpanmin.us | perl - --sudo App::cpanminus

Change C++11 compiler for CentOS (required for joinx installation)

Reference

https://www.softwarecollections.org/en/scls/rhscl/devtoolset-3/

1. Install a package with repository for your system:
On CentOS, install package centos-release-scl available in CentOS repository:
    $ sudo yum install centos-release-scl
On RHEL, enable RHSCL repository for you system:
    $ sudo yum-config-manager --enable rhel-server-rhscl-7-rpms
2. Install the collection:
    $ sudo yum install devtoolset-3
3. Start using software collections:
    $ scl enable devtoolset-3 bash
Set env variables --optional
    CC=gcc CXX=g++ 

Install samtools ( Download the samtools-0.1.19 from SOURCEFORGE (http://sourceforge.net/projects/samtools/files/samtools/0.1.19) )

    tar jxf samtools-0.1.19.tar.bz2
    cd samtools-0.1.19
    make
    export SAMTOOLS_DIR=$PWD
    sudo mv samtools /usr/local/bin/

Install calcRoiCovg

    git clone https://github.com/Beifang/calcRoiCovg.git
    cd calc-roi-covg
    make
    sudo mv calcRoiCovg /usr/local/bin/

Install bedtools

    wget https://github.com/arq5x/bedtools2/archive/v2.27.1.tar.gz
    tar -zxvf v2.27.1.tar.gz
    cd bedtools2-2.27.1/
    make
    sudo mv ./bin /usr/local/bin/

Install joinx

    git clone --recursive https://github.com/genome/joinx.git
    cd joinx
    mkdir build
    cd build
    cmake ..
    make deps
    make
    sudo make install

Fix joinx bugs

    StreamLineSource.cpp
    bool StreamLineSource::getline(std::string& line) {
        std::getline(_in, line);
        return true;
    }

Intall Perl modules

    sudo cpanm Test::Most 
    sudo cpanm Statistics::Descriptive
    sudo cpanm Statistics::Distributions
    sudo cpanm Bit::Vector

Install MuSiC2 package

    git clone https://github.com/ding-lab/MuSiC2
    cd MuSiC2
    sudo cpanm MuSiC2-#.#.tar.gz

Notes: Python is needed to be installed if you run music2 dendrix & dendrix-permutation

example

  1. smg test example:

Make a dir for MuSiC2 smg running

    mkdir music2_smg_running
    cd music2_smg_running

Make subdirs where all the runtime logs can be written

    mkdir logs
    mkdir logs/calc_covg

Get calculate coverage command list

    music2 bmr calc-covg --roi-file ./example/smg/example.roi_file --reference-sequence /reference_dir/ucsc.hg19.fa --bam-list ./example/smg/example.bam_list --output-dir . --cmd-list-file example.run-coverage-command

Run roi coverage for each sample

    bash example.run-coverage-command

Run bmr calc-covg again to get gene coverage

    music2 bmr calc-covg --roi-file ./example/smg/example.roi_file --reference-sequence /reference_dir/ucsc.hg19.fa --bam-list ./example/smg/example.bam_list --output-dir .

Run calc-bmr to measure overall and per-gene mutation rates. Give it extra memory, because it may need it

    music2 bmr calc-bmr --roi-file ./example/smg/example.roi_file --reference-sequence /reference_dir/ucsc.hg19.fa --bam-list ./example/smg/example.bam_list --maf-file ./example/smg/example.input.maf --output-dir . --show-skipped

Run SMG test using an FDR threshold appropriate for these mutation rates

    music2 smg --gene-mr-file gene_mrs --output-file smgs --max-fdr 0.05 --processors 1
  1. dendrix example:

Runs the MCMC for 1000000 iterations, sampling sets of size 3 every 1000 iterations. Produces two files (since 1 experiment is run):

    music2 dendrix --mutations-file example/dendrix/mutation_matrix --set-size 3 --minimum-freq 1 \
        --number-interations 1000000 --analyzed-genes-file example/dendrix/analyzed_genes \
        --number-experiments 1 --step-length 1000

If you want to compute the p-value for the second set having weight 47, you can run:

    music2 dendrix-permutation --mutations-file example/dendrix/mutation_matrix --set-size 3 --minimum-freq 1 \
        --number-interations 1000000 --analyzed-genes-file example/dendrix/analyzed_genes \
        --number-permutations 100 --value-tested 47 --rank 2

SUPPORT

If you have any questions, please contact one or more of the following folks:

Beifang Niu [email protected] Li Ding [email protected]

music2's People

Contributors

adamds avatar beifang avatar ckandoth avatar mwyczalkowski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

music2's Issues

Can't find function: mutation-relation

Hi,
I'm trying to use MuSiC2 to solve the problems of interactions between mutation-related genes, but I can't use the function called mutation-relation. Will this fuction be updated?

music2 smg error: Argument "Reference_Allele" isn't numeric in numeric lt (<)

I get the following error running music2 smg on the music2 bmr output maf.

$ music2 smg --gene-mr-file=/path/output_bmr.maf --output-file=t.txt
Argument "Reference_Allele" isn't numeric in numeric lt (<) at /path/music2/music2-0.2/lib/TGI/MuSiC2/Smg.pm line 89, <GEN0> line 1.
Argument "Variant_Classification" isn't numeric in numeric lt (<) at /path/music2/music2-0.2/lib/TGI/MuSiC2/Smg.pm line 89, <GEN0> line 1.
Unrecognized mutation category in gene-mr-file. Inappropriate ioctl for device

I'm not sure why this is. The MAF looks correctly formatted, and columns Reference_Allele and Variant_Classification are obviously not numeric.

Any debugging pointers?

CpG coverage warning + uninitialized value warning

Hello,

We've been working on running MuSiC2 on some dog cancer samples to find significantly mutated genes. There's a couple of types of warnings we've been seeing that we'd like to get advice on.

We have been getting warnings of the following type:

#More CpG_Transitions seen in ENSCAFG00000001106 than there are bps with sufficient coverage!
#More CpG_Transversions seen in ENSCAFG00000001136 than there are bps with sufficient coverage!

When we looked at the CpG counts in the coverage files we saw that they were always zero. When we examined the subprogram calcRoiCovg, if we swapped the inputs such that CpGs were counted before CGs the CpG counts were no longer zero, e.g. when we ran the parallelized calc-covg step with

--bp-class-types=AT,CpG,CG

instead of

--bp-class-types=AT,CG,CpG

then CpG counts would not always have zero values. Additionally, for each ROI the new CG count plus the new CpG count equals the old CG count, e.g.

#Gene	ROI	Length	Covered	ATs_Covered	CGs_Covered	CpGs_Covered
Original output:
ENSCAFG00000000001	chr1:252393-252564	172	172	96	76	0

New output (columns swapped back for better comparison):
ENSCAFG00000000001	chr1:252393-252564	172	172	96	74	2

When we swapped the resulting (non-zero) columns back into the order expected by MuSiC2 and continued running from calc-covg using the files generated, we no longer saw the above warning.

We are not sure why this is happening. However if calcRoiCovg counts the categories in order and then removes those basepairs from consideration, that would be consistent with what we see (e.g. CpG basepairs are being counted as two CGs before CpGs are checked for).

We are further seeing the following warnings:

Use of uninitialized value in addition (+) at /usr/local/share/perl/5.26.1/TGI/MuSiC2/CalcBmr.pm line 452.
Use of uninitialized value $muts_in_class in subtraction (-) at /usr/local/share/perl/5.26.1/TGI/MuSiC2/CalcBmr.pm line 456.
Use of uninitialized value $muts_in_class in division (/) at /usr/local/share/perl/5.26.1/TGI/MuSiC2/CalcBmr.pm line 457.

We see these warnings regardless of the state of the CpG counts (e.g. we do not think they are caused by the change we made described above). Looking at the code, we think that these warnings are happening because gene_mr is no longer being initialized to zeroes, e.g. lines 248-253 of CalcBmr.pm are commented out. This means that at lines 452 and 455 (which sets muts_in_class) sometimes gene_mr will be of type undef (which may cause muts_in_class to be undef). Since the code is trying to use undef in arithmetic it will print a warning. We don't think this causes any functional problems since Perl will regard undef as zero as desired; mostly we just want to confirm that this warning isn't a symptom that something else is wrong.


To reiterate, the questions are:

  1. Does our fix for the CpG warning sound correct? If not, is this because the described workaround will cause other problems, or because we have misdiagnosed the underlying cause of the warnings?
  2. Does our analysis of the uninitialized value warnings sound correct, and therefore we can safely ignore them?

Thanks!

TestVcfEntry.cpp.o error

/addData01/01_Program_to_install/75.MuSIC2/joinx/joinx/build/vendor/src/gtest160/include/gtest/gtest.h:269:3: note: no known conversion for argument 1 from ‘std::basic_istream’ to ‘const testing::AssertionResult&’
make[2]: *** [build/test/lib/fileformats/CMakeFiles/TestFileFormats.dir/build.make:206: build/test/lib/fileformats/CMakeFiles/TestFileFormats.dir/TestVcfEntry.cpp.o] error 1
make[1]: *** [CMakeFiles/Makefile2:357: build/test/lib/fileformats/CMakeFiles/TestFileFormats.dir/all] error 2
make: *** [Makefile:163: all] error 2

i modified StreamLineSource.cpp
and typed 'make', error occurs.
Help me.

Command music2 bmr calc-covg fails

The command requires optional argument "cmd-list-file" and the argument is given the resulted script is not run. The error message:
... not found in ../Data/Music/covG/output/roi_covgs. please make a command list file to run calcRoiCovg !

read.table error

root@sever:MUSIC2# music2 smg --gene-mr-file result/gene_mrs --output-file smgs --max-fdr 0.05 --processors 1

Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
Calls: smg_test -> read.delim -> read.table
Execution halted
Error in read.table(pval_file, header = T, sep = "\t") :
no lines available in input
Calls: smg_fdr -> read.table
Execution halted
Couldn't open smgs_detailed. No such file or directory

help me out plz.

Thanks you

Install error

at directory of samtools1.9

export SAMDIR=$PWD

cd ../calcRoiCovg/

make

Please define environment variable SAMDIR to point to your samtools libraries

what should I do that?

Thanks

MAF allele specification

In the file "CalcBmr.pm", line 393-395, you require that alleles be strings consisting of A,C,T, or G. However the MAF 2.3 documentation specifies that alleles be represented as a dash ("-") in some cases. For instance the reference allele is a dash in the case of an insertion.

typo in the CalcBmr.pm module

https://github.com/ding-lab/MuSiC2/blob/master/lib/TGI/MuSiC2/CalcBmr.pm

Line 113, the right parenthesis may be reported as syntax error, please remove it. Thanks.

my $cp_command = "cp $temp_clustering_maf_file temporary_MAF_from_mutation_cluster_function_20150121.maf" );
lab@localhost:~/MuSiC2$ music2
syntax error at /home/lab/perl5/lib/perl5/TGI/MuSiC2/CalcBmr.pm line 113, near ""cp $temp_clustering_maf_file temporary_MAF_from_mutation_cluster_function_20150121.maf" )"
Global symbol "$cp_command" requires explicit package name at /home/lab/perl5/lib/perl5/TGI/MuSiC2/CalcBmr.pm line 114.
Compilation failed in require at /home/lab/perl5/lib/perl5/TGI/MuSiC2/Bmr.pm line 8.
BEGIN failed--compilation aborted at /home/lab/perl5/lib/perl5/TGI/MuSiC2/Bmr.pm line 8.
Compilation failed in require at /home/lab/perl5/bin/music2 line 14.
BEGIN failed--compilation aborted at /home/lab/perl5/bin/music2 line 14.

permutationTestDendrix.py

I am reading your scripts and I have something to ask:
In line 153,
why you make this dictionary called 'sample_mutatedGenes' zero after taking a lot of effort to sort it out?

JoinX versioning

Is it necessary to use a specific version of JoinX?

$ grep -l joinx1.7 *
CalcBmr.pm
CalcWigCovg.pm
$ grep -l joinx1.8 *
CalcBmrModifier.pm
CalcWindowRoi.pm

We build joinX using the master branch from github. The executable is "joinX" without any extension. Can the perl be updated to just say "joinX"?

Excessive RAM usage with large ROI file.

We are trying to use MuSiC2 with regions of interest files that define:

  1. regulatory or conserved noncoding regions
  2. Transcription factor binding sites
  3. DNAse hypersensitive sites

Trying the first ROI file described above. My ROI file has 1,408,562 regions of interest, but these regions contain extensive overlap. I have 120 samples.

I provide 40 GB of RAM and the process dies after hitting a swap limit of 42 GB. Now I am trying 70 GB. We only have 5 TB of shared RAM and it wouldn't be fair to other users to use all of it... Any suggestions? Why is the code using so much RAM?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.