Coder Social home page Coder Social logo

getzlab-pcawg-mutsig2cv_nc's Introduction

MutSig2CV_NC (PCAWG)

MutSig2CV (Lawrence et al., 2014) adapted for noncoding significance analysis, as run for the PCAWG drivers paper (Rheinbay et al., 2020).

Installing

MutSig is implemented in MATLAB. If you have a MATLAB installation and wish to run MutSig interactively on the MATLAB console, skip to the Running section below. If you do not have MATLAB installed, or do not wish to run interactively, MutSig can be run as a standalone executable. The standalone executable is available for 64 bit Linux systems only, and requires that the MATLAB R2016a runtime be installed. You can download and install the runtime environment from here. Runtime installation instructions can be found here.

Once the runtime is successfully installed, you must add it to your LD_LIBRARY_PATH.

MCRROOT=<path to runtime you specified when installing>
export LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/opengl/lib/glnxa64

To run on the cohort in the manuscript, MutSig requires ~100 GB of reference files. This is due to the fact that each cohort has a unique set of covariates and coverage models (see "Cohort Parameters" below). Since these files are too large to include in a GitHub repository, they are hosted in cloud storage; please see the README in the ref/ folder for instructions on how to obtain them.

Running

To run interactively on the MATLAB console, start MATLAB in this directory, and run:

MutSig2CV_NC(<path to mutations>, <path to output directory>, <path to cohort parameters>)

To run the standalone application, cd to this directory, and run:

bin/MutSig2CV_NC <path to mutations> <path to output directory> <path to cohort parameters>

MutSig looks for its reference files relative to this directory, so it is essential it is run here. As stated previously, the standalone application will only work on 64 bit Linux systems. As distributed, the interactive version will also only work on 64 bit Linux, but can be made to work on any platform as long as all supplied .mex C/C++ extensions are recompiled.

Each input is decribed below.

Description of inputs

  • Mutations file: Absolute path to the set of mutations to analyze. Format is specified in the following section, Mutation Input Format

  • Output directory: Absolute path to the directory where MutSig will save its output. Will be created if necessary. NB: Any previous MutSig results in this directory will be overwritten!

  • Cohort parameters: Each cohort and noncoding feature (e.g., promoters, UTRs, enhancers, etc.) analyzed in the manuscript has its own set of reference files. This is due to the fact that genomic coverage in noncoding regions can vary substantially from cohort to cohort, and MutSig requires an accurate estimate of sequencing coverage to properly compute mutation densities. MutSig must be explicitly told which cohort/feature-specific reference files to use, which are located in run/params. For example, to analyze transcription factor binding sites in the glioblastoma cohort, use run/params/CNS-GBM_TFBS.params.txt. Cohort parameter files' names follow the convention <cohort>_<feature>.txt, where <cohort> corresponds to the "Tier 3 Abbreviations" in the PCAWG tumor subtype table, and <feature> corresponds to the following noncoding features:

    • enhancers: Enhancers
    • gc_pc.3utr: 3' UTRs
    • gc_pc.5utr: 5' UTRs
    • lncrna: lncRNA genes
    • lncrna.prom: lncRNA promoters
    • mirna.mat: Mature miRNAs
    • mirna.pre: Pre-miRNAs
    • mirna.prom: miRNA promoters
    • promoters: Coding gene promoters
    • TFBS: Transcription Factor Binding Sites

Mutation Input Format

As input, MutSig takes a tab-delimited file with each line annotating a single mutation in a single patient. Columns can be in any order, with names and formats as follows. To provide maximal input flexibility, MutSig accepts synonyms for each column name. Column names are case sensitive.

  • chr: Chromosome of the mutation. MutSig only analyzes mutations on autosomal or sex chromosomes, and does not consider the mitochondrial chromosome or unplaced/alternate contigs.

    • Range: (chr)?[1..24XY]
    • Synonyms: Chromosome
  • pos: hg19 position of the mutation, 1-indexed.

    • Regex: [0-9]+
    • Synonyms: Position, start, Start_position
  • patient: Unique identifier for the patient.

    • Regex: [A-Za-z0-9]+
    • Synonyms: Tumor_Sample_Barcode, Patient_name
  • ref_allele: hg19 reference base(s) for the position. In the case of insertions, must be "-".

    • Regex: (-|[ACGT]+)
    • Synonyms: Reference_Allele
  • newbase: Observed variant allele at the position. In the case of deletions, must be "-".

    • Regex: (-|[ACGT]+)
    • Synonyms: Tumor_Allele, Tum_allele, Alt_allele, Alternate_allele, Tumor_Seq_Allele2

Note that MutSig does not require any other mutation annotations; it infers everything else on its own.

getzlab-pcawg-mutsig2cv_nc's People

Contributors

julianhess avatar

Stargazers

Yasunori Kogure avatar Zijian Ma avatar  avatar  avatar smallworld avatar  avatar zmiimz avatar

Watchers

 avatar James Cloos avatar David Heiman avatar  avatar michellec avatar Sitapriya Moorthi avatar Martyna Urbanek-Trzeciak avatar Qing Zhang avatar  avatar

Forkers

ran485

getzlab-pcawg-mutsig2cv_nc's Issues

MutSigCV

Hello, Julianhess

I use MutSigCV module of Genepattern analyzed my mutation data. It shows some error. I think maybe something is wrong with my input file. Could you please tell me how to revise it? The error is:

Error using gp_MutSigCV>MutSig_runCV (line 850)
not enough mutations to analyze

Error in gp_MutSigCV (line 194)
MutSigCV
v1.3

(c) Mike Lawrence and Gaddy Getz
Broad Institute of MIT and Harvard
MutSigCV: PREPROCESS
Loading mutation_file...
Loading coverage file...
Processing mutation "effect"...
WARNING: 2/21453 mutations could not be mapped to effect using mutation_type_dictionary_file:

 : [2]

----TOTAL: [2]
They will be removed from the analysis.
Processing mutation "categ"...
NOTE: unable to perform category discovery, because no chr_files available.Will use two categories: missense and null+indel.
Collapsing coverage...
Writing preprocessed files.
MutSig_preprocess finished.

The following one is input file:
Hugo_Symbol Tumor_Sample_Barcode Variant_Classification Chromosome Start_position Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2
A1CF LM-BN-1 3'UTR 10 52563029 GT G G
A1CF LM-BN-4 3'UTR 10 52563029 GT G G
A1CF LM-BN-5 3'UTR 10 52566432 CT C C
A2M LM-BN-1 Intron 12 9220823 G GA GA
A2M LM-BN-4 Intron 12 9220823 G GA GA
A2ML1 LM-BN-1 3'UTR 12 9028242 T C C
A2ML1 LM-BN-1 3'UTR 12 9028264 G A A
AAAS LM-BN-1 Missense_Mutation 12 53715218 G A A
AACS LM-BN-1 5'UTR 12 125586335 TG CA CA
AACS LM-BN-2 5'UTR 12 125607905 A G G
AACS LM-BN-2 5'UTR 12 125607960 C G G
AACS LM-BN-3 5'UTR 12 125586308 CTT C C
AACS LM-BN-3 5'UTR 12 125586308 CTT CTTT CTTT
AACS LM-BN-4 5'UTR 12 125586308 CT C C
AACS LM-BN-4 5'UTR 12 125586308 CT CTT CTT
AACS LM-BN-1 5'UTR 12 125586359 G C C

Thanks.

Error when running MutSig2CV_NC

Hi,

I am trying to run Mutsig2CV_NC on my own HCC WGS data with MATLAB v2019a on Windows and MCR on linux, like
"
MutSig2CV_NC('input/input.fiveCol.maf', 'output', 'run/params/Liver-HCC_alt_prom.params.txt') "
Then it showed
"
WILL LOAD+RUN.
LOADING DATA
"
But then I encountered errors said
"
errors when using require_fields (line 9)
Structure is missing required field "num"
Error: demand_fields (line 2)
require_fields(varargin{:});
Error: map_categories_to_65 (line 15)
demand_fields(C,{'num','name'});
Error MutSig2CV_v5_load (line 78)
M.context_and_effect.context65 =
map_categories_to_65(context_and_effect_categs_file); %XXX: make sure this works
Error MutSig2CV_NC (line 96)
[M P] = MutSig2CV_v5_load(args{:});
"
in Windows and
"
Structure is missing required field "num"
Error in demand_fields (line 2)
Error in map_categories_to_65 (line 15)
Error in MutSig2CV_v5_load (line 78)
Error in MutSig2CV_v5_wrapper (line 96)
" in linux.

Could you please tell me how should I avoid such errors? Many thanks.

Best wishes,
Jennifer

Appreciate help for downloading files for Cohort parameters

Dear developer team,

Thank you for developing the new Mutsig for non-coding regions. I'm trying to download the provided resources (cohort_parameter) and appreciate help with that.

I am new to google cloud bucket. At first, I try to install the gsutil on my server but failed, so I install it in my local desktop. However, one file seems too large to download and the error shows that my local desktop doesn't allow for a "lustre" file. Later I also try to install the google cloud python package for downloading but also stuck at somewhere.

May I ask is there any other places I could download the files from, except the google bucket?

Many Thanks,
Jue

No PanCan params for miRNAs?

Is there a reason why in run/params there are no files available for PanCan cohort for miRNA (neither mature, pre or promotors)?

Error when running Mutsig2CV_NC

Hi Julian,

I run the software on my own data on linux. The results seems weird. For example, in enhancers' output file, the first 5 lines is:
"
codelen nnei nind nnon npat nsite pCL pFN pCL2 pFN2 pCF q
97 13 15 15 15 2 NaN NaN NaN NaN NaN 1.000000e-16
607 18 8 11 11 4 NaN NaN NaN NaN NaN 1.000000e-16
1333 15 4 11 11 10 NaN NaN NaN NaN NaN 1.000000e-16
375 3 10 10 10 2 NaN NaN NaN NaN NaN 1.000000e-16
193 43 10 10 10 4 NaN NaN NaN NaN NaN 1.000000e-16
"
During the process there were 2 warnings:
“Warning: NARGCHK will be removed in a future release. Use NARGINCHK or NARGOUTCHK instead.

In sparse_to_csr (line 26)
In scomponents (line 31)
In new_find_duplicate_samples (line 68)
In MutSig2CV_v5_load (line 218)
In MutSig2CV_v5_wrapper (line 96)
Warning: The KEYBOARD function cannot be used in compiled applications.
In keyboard (line 4)
In MutSig2CV_v5_load (line 272)
In MutSig2CV_v5_wrapper (line 96)

And I also used Matlab 2016b on Windows, and encountered errors like:
"
Undefined function or variable 'str2doubleq'。

Error: str2doubleq_wrapper (line 3)
out = real(str2doubleq(in));

Error: make_numeric (line 37)
x = str2doubleq_wrapper(x);

Error: MutSig2CV_v5_load (line 197)
M.mut = make_numeric(M.mut,'pos');

Error: MutSig2CV_NC (line 96)
[M P] = MutSig2CV_v5_load(args{:});
"

Could you please tell me how should I proceed? Thank you very much.

Best,
Jennifer

MutsigCV

Hello,

I use MutSigCV analyzed my mutation data. It shows some error. I think maybe something is wrong with my input file. Could you please tell me how to revise it? The error is:

Error using gp_MutSigCV>MutSig_runCV (line 850)
not enough mutations to analyze

Error in gp_MutSigCV (line 194)
MutSigCV
v1.3

(c) Mike Lawrence and Gaddy Getz
Broad Institute of MIT and Harvard

MutSigCV: PREPROCESS

Loading mutation_file...
Loading coverage file...
Processing mutation "effect"...
WARNING: 2/21453 mutations could not be mapped to effect using mutation_type_dictionary_file:

         : [2]
----TOTAL: [2]
      They will be removed from the analysis.

Processing mutation "categ"...
NOTE: unable to perform category discovery, because no chr_files available.Will use two categories: missense and null+indel.
Collapsing coverage...
Writing preprocessed files.
MutSig_preprocess finished.

Thanks.
The following one is input file:
Hugo_Symbol Tumor_Sample_Barcode Variant_Classification Chromosome Start_position Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2
A1CF LM-BN-1 3'UTR 10 52563029 GT G G
A1CF LM-BN-4 3'UTR 10 52563029 GT G G
A1CF LM-BN-5 3'UTR 10 52566432 CT C C
A2M LM-BN-1 Intron 12 9220823 G GA GA
A2M LM-BN-4 Intron 12 9220823 G GA GA
A2ML1 LM-BN-1 3'UTR 12 9028242 T C C
A2ML1 LM-BN-1 3'UTR 12 9028264 G A A
AAAS LM-BN-1 Missense_Mutation 12 53715218 G A A
AACS LM-BN-1 5'UTR 12 125586335 TG CA CA
AACS LM-BN-2 5'UTR 12 125607905 A G G
AACS LM-BN-2 5'UTR 12 125607960 C G G
AACS LM-BN-3 5'UTR 12 125586308 CTT C C
AACS LM-BN-3 5'UTR 12 125586308 CTT CTTT CTTT
AACS LM-BN-4 5'UTR 12 125586308 CT C C
AACS LM-BN-4 5'UTR 12 125586308 CT CTT CTT
AACS LM-BN-1 5'UTR 12 125586359 G C C

MutSigCV

Hello,

I use MutSigCV analyzed my mutation data. It shows some error. I think maybe something is wrong with my input file. Could you please tell me how to revise it? The error is:

Error using gp_MutSigCV>MutSig_runCV (line 850)
not enough mutations to analyze

Error in gp_MutSigCV (line 194)
MutSigCV
v1.3

(c) Mike Lawrence and Gaddy Getz
Broad Institute of MIT and Harvard
MutSigCV: PREPROCESS
Loading mutation_file...
Loading coverage file...
Processing mutation "effect"...
WARNING: 2/21453 mutations could not be mapped to effect using mutation_type_dictionary_file:

     : [2]

----TOTAL: [2]
They will be removed from the analysis.
Processing mutation "categ"...
NOTE: unable to perform category discovery, because no chr_files available.Will use two categories: missense and null+indel.
Collapsing coverage...
Writing preprocessed files.
MutSig_preprocess finished.

The following one is input file:
Hugo_Symbol Tumor_Sample_Barcode Variant_Classification Chromosome Start_position Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2
A1CF LM-BN-1 3'UTR 10 52563029 GT G G
A1CF LM-BN-4 3'UTR 10 52563029 GT G G
A1CF LM-BN-5 3'UTR 10 52566432 CT C C
A2M LM-BN-1 Intron 12 9220823 G GA GA
A2M LM-BN-4 Intron 12 9220823 G GA GA
A2ML1 LM-BN-1 3'UTR 12 9028242 T C C
A2ML1 LM-BN-1 3'UTR 12 9028264 G A A
AAAS LM-BN-1 Missense_Mutation 12 53715218 G A A
AACS LM-BN-1 5'UTR 12 125586335 TG CA CA
AACS LM-BN-2 5'UTR 12 125607905 A G G
AACS LM-BN-2 5'UTR 12 125607960 C G G
AACS LM-BN-3 5'UTR 12 125586308 CTT C C
AACS LM-BN-3 5'UTR 12 125586308 CTT CTTT CTTT
AACS LM-BN-4 5'UTR 12 125586308 CT C C
AACS LM-BN-4 5'UTR 12 125586308 CT CTT CTT
AACS LM-BN-1 5'UTR 12 125586359 G C C

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.