Coder Social home page Coder Social logo

exomiser / exomiser Goto Github PK

View Code? Open in Web Editor NEW
186.0 186.0 54.0 286.13 MB

A Tool to Annotate and Prioritize Exome Variants

Home Page: https://exomiser.readthedocs.io

License: GNU Affero General Public License v3.0

Java 94.74% HTML 5.26% Shell 0.01%
analysis exome genomics monarchinitiative phenotypes variants

exomiser's Issues

Handling multiple alternative alleles

I discussed this earlier with @pnrobinson and he proposed/we agreed on the following (if I remember correctly). I'm adding this issue so we are all on the same page here and for some further sanity checking.

Consider the case of having multiple alternative alleles with REF=A, ALT=T,G. In our individuals, we see the following:

A/A A/T T/G

Exomiser will interpret this as two variants:

0/0 0/1 0/1  ==  wt  het het
0/0 0/0 0/1  ==  wt  wt  het

Exomiser parsing of indels is incorrect?

It seems that indels are not getting parsed correctly, resulting in the AF and dbSNP lookups to fail.

For example, the VCF file contains:
chr11 61165731 . C CA

This results in the following annotation in the exomiser output: chr11:g.61165731->A, which is incorrect. It should be g.6116573**2**

The output lists there as being no frequency data, but this is actually rs11382548 with MAF 14%

Not sure how common this is? Can anyone confirm?

Reformat files to use unix newlines

I would suggest having the source base use unix newlines instead of windows newlines. Having the windows newlines makes it harder to work with github, if nothing else. For example, if you try to edit a file (e.g. exomiser-cli/src/main/resources/jdbc.properties) within github, you'll see that the diff is the entire file because the ^M characters at the end of every line get automatically stripped.

Make h2 access read-only to prevent db locking

It seems that, by default, the jdbc connection will update the h2 database during normal Exomiser runs (presumably just last-accessed fields and whatnot). This causes problems when attempting to access the h2 database concurrently, since this then requires table locking, and ultimately can result in the database entering an inconsistent state with uncommitted changes if an Exomiser run crashes. Once that happens Exomiser can no longer run until you manually go in and drop uncommitted changes in the db.

There is an easy fix that has worked for us. Adding the following to jdbc.properties:
ACCESS_MODE_DATA=r

Behaviour of a Filterable object

Since issue #2 the VcfWriter so output is now more compatible with the actual VCF spec, in particular, if a variant has not been filtered the FILTER column should be empty for that variant:

##fileformat=VCFv4.1
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  GENOTYPE
chr3    4   .   G   C   2.2     ;VARIANT NOT ANALYSED - NO GENE ANNOTATIONS GT  0/1
chr1    1   .   A   T   2.2 PASS    ;EXOMISER_GENE=ABC1;EXOMISER_VARIANT_SCORE=1.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0   GT  0/1
chr1    2   .   T   -   2.2 Target  ;EXOMISER_GENE=ABC1;EXOMISER_VARIANT_SCORE=0.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0   GT  0/1
chr2    3   .   C   T   2.2 Frequency;Target    ;EXOMISER_GENE=CDE2;EXOMISER_VARIANT_SCORE=0.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0   GT  0/1
chr3    5   .   G   C   2.2     ;EXOMISER_GENE=CDE2;EXOMISER_VARIANT_SCORE=1.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0   GT  0/1

However this has raised a question about the Filterable interface which I'd like some feedback on as it impacts a fundamental behaviour. Currently a Filterable either passes or fails, but really it can exist in one of three states - passed, failed and unfiltered/not yet filtered. Given this:

  1. Should a Filterable (VariantEvaluation or Gene) report true or false to passedFilters if no filters have been applied? (I think this should really be false - however, this is a reversal of the current behaviour which is always true until a filter has been failed and passedFilters is quite well used- a more accurate name would be hasNotFailedFilters )
  2. Should a Filterable be able to report its actual state - PASSED/FAILED/UNFILTERED? (I vote yes as it is quite explicit and prevents you from having to combine passedFilters and a newly added isUnFiltered Booleans in order to infer a missing failedFilters or add the failedFilters Boolean too)
  3. Given the first two points is there any point in having any other methods in the Filterable interface other than passedFilter(FilterType) and getFilterStatus?

It would be possible to keep the existing behaviour of passedFilters and add getFilterStatus for ease of use and backwards compatibility at the expense of some potential confusion. But this all depends on what people are using - will any of these changes actually have any direct impact on you? If not I suggest clean logic should be applied as this tends to keeps things simpler and simple is good.

Incorporate Jannovar v0.11 + into Exomiser

This will likely be pretty invasive so needs to ideally be in version 6.
I'll work with Manuel on this.

Changes will impact on anything involving Variant and VariantEvaluation. (core.filters, core.factories)

Add report feature to flag up variants which Jannovar fails to annotate

Sometimes Jannovar is unable to annotate a variant and throws an exception. This is caught by Exomiser, but the variant is not included in the analysis or results which could lead to incorrect results.

There are two ways this could be handled:

  1. These variants could be flagged and indicated to the user so that they are aware of the issue.
  2. Exomiser should simply stop the analysis and report the reason for the failure.

Votes on behaviour please....

Update intro page

Change first sentence of second paragraph to the below and also combine 1st and 2nd paragraph so all in bold as all equally important

"Variants are prioritized according to user-defined criteria on variant frequency, pathogenicity, quality, inheritance pattern, phenotype data from human and model organisms, and proximity in the interactome to phenotypically similar genes"

Add more informative error message to PhenIX when no HPO terms have been supplied

Child of issue #37
Related to issues #47 and #48

The error message is really confusing. It gives the impression that some files are missing. Maybe we should add a better error. Like: Please insert HPO terms for method phenix.

Or convert the OMIM-ID to HPO-terms.

/home/mschubach/Exomiser/files/exomiser-cli-6.0.0/data/phenix/0.out (No such file or directory)
    at de.charite.compbio.exomiser.core.prioritisers.util.ScoreDistributionContainer.parseDistributions(ScoreDistributionContainer.java:175)
    at de.charite.compbio.exomiser.core.prioritisers.PhenixPriority.<init>(PhenixPriority.java:153)
    at de.charite.compbio.exomiser.core.prioritisers.PriorityFactory.getPhenixPrioritiser(PriorityFactory.java:82)
    at de.charite.compbio.exomiser.core.prioritisers.PriorityFactory.makePrioritisers(PriorityFactory.java:54)
    at de.charite.compbio.exomiser.core.Exomiser.analyse(Exomiser.java:72)
    at de.charite.compbio.exomiser.cli.Main.runAnalysis(Main.java:130)
    at de.charite.compbio.exomiser.cli.Main.main(Main.java:62)

Filter for genes update

Allow pasting in of large list of genes or upload of a file e.g. the DDD list of developmental disorder genes

remove-off-target-syn option description is confusing

Does specifying this option remove off-target variants, or keep them? And is the default to keep them or remove them? It isn't quite clear.

 -T,--remove-off-target-syn             Keep off-target variants. These
                                        are defined as intergenic,
                                        intronic, upstream, downstream,
                                        synonymous or intronic ncRNA
                                        variants. Default: true

Clean-up package structure

The package structure in exomiser-core is still a bit out of kilter. I want to move everything under the core package so there can be no possibility of namespace clashes with classes from other exomiser jar files.

This will necessitate a full version number increment as it will break any existing code relying on exomiser-core.

The structure should be so:

  • core
    • Exomiser.java
    • ExomiserSettings.java
    • dao
    • factories
    • filter(s) (rename?)
    • frequency (move into model?)
    • model
    • pathogenicity (move into model?)
    • util (might not be needed)
    • writer(s) (rename?)
    • io (might not be needed)
    • priority (rename to prioritisers)
      • exomewalker
      • inheritance
      • omim
      • ...

Add licence and file headers for Maven Central hosting

Maven (naturally) has a plugin for this. Investigate and then follow through with what's needed to add/update the licence headers in the source code:

http://mojo.codehaus.org/license-maven-plugin

Also, in preparation for fully-opening up the codebase it would be good to add the minimum headers as required by maven Central Repository:
http://central.sonatype.org/pages/requirements.html

Then we can build on TravisCI and publish builds to maven central making it trivially easy for Java developers to use Exomiser.

Running cli jar without any arguments should display help

As per the protocols manuscript, running the jar without any arguments should display the cli help (currently only displayed if -help or --help is provided):

To test whether the installation was successful, run the command
$java -jar exomiser-cli-5.0.1.jar 
If the installation was successful, you will see a help message.

Fix dbSNP parsing

dbSNP seems to have changed its format recently such that alternate alleles appear in the same row with allele frequencies reported for the ref and these alts in order e.g.

9 140777306 rs4422842 C G,T,A . . RS=4422842;RSPOS=140777306;RV;dbSNPBuildID=111;SSR=0;SAO=0;VP=0x050128000a0514012e000100;WGT=1;VC=SNV;PM;PMC;SLO;NSM;REF;ASP;VLD;GNO;KGPhase3;CAF=0.846,.,.,0.154;COMMON=1

This needs to be processed to result in our frequency table
9 140777306 rs4422842 C G .
9 140777306 rs4422842 C T .
9 140777306 rs4422842 C A 0.154

Automatic integration tests

These are really needed to catch issues which could arise from changes in underlying dependencies such as jannovar and HTSJDK which could have subtle changes to a variant which can cause drastic changes to the outcome of an analysis.

Fail to run --prioritiser phenix

I just pulled the newest development branch and packed it with maven. I also used the actual data from the FTP website.

My command:

java -Xms5g -Xmx5g -jar exomiser-cli-6.0.0.jar \
--prioritiser=exomiser-allspecies -I AR -F 1 -D 607060 \
-v testVCF.vcf -o results/testresult \
--out-format=HTML \
--prioritiser phenix \
-p testPED.ped

The error is:

/home/mschubach/Exomiser/files/exomiser-cli-6.0.0/data/phenix/0.out (No such file or directory)
    at de.charite.compbio.exomiser.core.prioritisers.util.ScoreDistributionContainer.parseDistributions(ScoreDistributionContainer.java:175)
    at de.charite.compbio.exomiser.core.prioritisers.PhenixPriority.<init>(PhenixPriority.java:153)
    at de.charite.compbio.exomiser.core.prioritisers.PriorityFactory.getPhenixPrioritiser(PriorityFactory.java:82)
    at de.charite.compbio.exomiser.core.prioritisers.PriorityFactory.makePrioritisers(PriorityFactory.java:54)
    at de.charite.compbio.exomiser.core.Exomiser.analyse(Exomiser.java:72)
    at de.charite.compbio.exomiser.cli.Main.runAnalysis(Main.java:130)
    at de.charite.compbio.exomiser.cli.Main.main(Main.java:62)

The path is correctly set. /home/mschubach/Exomiser/files/exomiser-cli-6.0.0/data/phenix/1.out exists but not /home/mschubach/Exomiser/files/exomiser-cli-6.0.0/data/phenix/0.out. 0.out is not present in the download file.

Prioritiser options

  • Make exomiser v2 the default
  • Change Exomiser v1 to Exomiser (mouse only)
  • Change Exomiser v2 to Exomiser (all species)
  • Do we want to offer Phenix (human only) as well
  • Do we want to offer ExomeWalker

TSV-writer appears to substitute any 'vcf' in file path

It appears that anywhere the string vcf appears in the path of the output file, it is substituted with genes.tsv, even if it doesn't appear at the end (note that the vcf directory is changed to a genes.tsv directory which doesn't exist):

2015-02-09 02:24:01,429 INFO  de.charite.compbio.exomiser.core.writers.VcfResultsWriter [main] - VCF results written to file /dupa-filer/buske/phenomecentral/geno/vcf/F0000009/F0000009.vcf.
2015-02-09 02:24:01,431 ERROR de.charite.compbio.exomiser.core.writers.TsvGeneResultsWriter [main] - Unable to write results to file /dupa-filer/buske/phenomecentral/geno/genes.tsv/F0000009/F0000009.genes.tsv.
java.nio.file.NoSuchFileException: /dupa-filer/buske/phenomecentral/geno/genes.tsv/F0000009/F0000009.genes.tsv

Add option to filter VCF output to PASS entries

This could simplify the handling of, for example, whole-genome VCF files. We would use this feature in PhenomeCentral. As is, we just have to run the VCF files through a separate AWK step.

Enable filtering of whole-genome VCF files

Can't do this at present as the entire VCF file is read into memory, converted to a Variant and annotated using Jannovar, once this is done the variants are collected into their relevant genes, then filtered.

The VCF parsing, annotation and filtering needs to be streamed into a whole-exon set first, then we can continue on our merry way without requiring tens of gigs of RAM and hours of time.

TAB delimited output format (tsv) for variants

You wrote in the exomiser draft protocol that there is a TAB delimited file format. Right now there exists one for genes. I think if people using pipelines it will be great to have a TSV-file with the variants and all the annotations (still in the vcf-file annotations are missing).

I can start implementing this feature if it is OK with you.

Collapsible pheno-evidence sections

Results get a bit crazy looking when you have a lot of input HPO terms e.g. the classic Pfeiffer example.

Can we make each evidence section collapsible or only show the one with the best score i.e. contributing to the combined Exomiser score. Prefer the former as we may combine all scores eventually

Can the io.html package be removed now?

This package has now been rendered redundant for the core exomiser-cli functionality as the latest changes in release 5.2.0 are using Thymeleaf to render the HTML and the HtmlWriter simply hands the context with the data it needs to the rendering engine which fills in the resources/html/templates/results.html with this data.

@pnrobinson - Do you still need it for Panel, CRE and Walker? If not then this package has served it's purpose and it's time to retire it from the codebase. Let me know and I'll take care of it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.