exomiser / exomiser Goto Github PK

View Code? Open in Web Editor NEW

186.0 186.0 54.0 286.13 MB

A Tool to Annotate and Prioritize Exome Variants

Home Page: https://exomiser.readthedocs.io

License: GNU Affero General Public License v3.0

Java 94.74% HTML 5.26% Shell 0.01%

analysis exome genomics monarchinitiative phenotypes variants

exomiser's Issues

Handling multiple alternative alleles

I discussed this earlier with @pnrobinson and he proposed/we agreed on the following (if I remember correctly). I'm adding this issue so we are all on the same page here and for some further sanity checking.

Consider the case of having multiple alternative alleles with REF=A, ALT=T,G. In our individuals, we see the following:

A/A A/T T/G

Exomiser will interpret this as two variants:

0/0 0/1 0/1  ==  wt  het het
0/0 0/0 0/1  ==  wt  wt  het

Exomiser parsing of indels is incorrect?

It seems that indels are not getting parsed correctly, resulting in the AF and dbSNP lookups to fail.

For example, the VCF file contains:
chr11 61165731 . C CA

This results in the following annotation in the exomiser output: chr11:g.61165731->A, which is incorrect. It should be g.6116573**2**

The output lists there as being no frequency data, but this is actually rs11382548 with MAF 14%

Not sure how common this is? Can anyone confirm?

Make OntologyService class for managing this aspect for Prioritisers

Required by Phive, HiPhive and Phenix too.

Currently this is handled by code-duplication.

Resolve styling clash with Sanger css

Can we add in our own Sanger header and footer

Fix HTML output from OMIM prioritiser

Currently the output is already pre-marked-up in HTML so that this is visible directly on the output:

<a href="http://www.omim.org/entry/-10">Craniosynostosis</a>

should look like:

Craniosynostosis

Reformat files to use unix newlines

I would suggest having the source base use unix newlines instead of windows newlines. Having the windows newlines makes it harder to work with github, if nothing else. For example, if you try to edit a file (e.g. exomiser-cli/src/main/resources/jdbc.properties) within github, you'll see that the diff is the entire file because the ^M characters at the end of every line get automatically stripped.

Make h2 access read-only to prevent db locking

It seems that, by default, the jdbc connection will update the h2 database during normal Exomiser runs (presumably just last-accessed fields and whatnot). This causes problems when attempting to access the h2 database concurrently, since this then requires table locking, and ultimately can result in the database entering an inconsistent state with uncommitted changes if an Exomiser run crashes. Once that happens Exomiser can no longer run until you manually go in and drop uncommitted changes in the db.

There is an easy fix that has worked for us. Adding the following to jdbc.properties:
ACCESS_MODE_DATA=r

Behaviour of a Filterable object

Since issue #2 the VcfWriter so output is now more compatible with the actual VCF spec, in particular, if a variant has not been filtered the FILTER column should be empty for that variant:

##fileformat=VCFv4.1
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  GENOTYPE
chr3    4   .   G   C   2.2     ;VARIANT NOT ANALYSED - NO GENE ANNOTATIONS GT  0/1
chr1    1   .   A   T   2.2 PASS    ;EXOMISER_GENE=ABC1;EXOMISER_VARIANT_SCORE=1.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0   GT  0/1
chr1    2   .   T   -   2.2 Target  ;EXOMISER_GENE=ABC1;EXOMISER_VARIANT_SCORE=0.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0   GT  0/1
chr2    3   .   C   T   2.2 Frequency;Target    ;EXOMISER_GENE=CDE2;EXOMISER_VARIANT_SCORE=0.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0   GT  0/1
chr3    5   .   G   C   2.2     ;EXOMISER_GENE=CDE2;EXOMISER_VARIANT_SCORE=1.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0   GT  0/1

However this has raised a question about the Filterable interface which I'd like some feedback on as it impacts a fundamental behaviour. Currently a Filterable either passes or fails, but really it can exist in one of three states - passed, failed and unfiltered/not yet filtered. Given this:

Should a Filterable (VariantEvaluation or Gene) report true or false to passedFilters if no filters have been applied? (I think this should really be false - however, this is a reversal of the current behaviour which is always true until a filter has been failed and passedFilters is quite well used- a more accurate name would be hasNotFailedFilters )
Should a Filterable be able to report its actual state - PASSED/FAILED/UNFILTERED? (I vote yes as it is quite explicit and prevents you from having to combine passedFilters and a newly added isUnFiltered Booleans in order to infer a missing failedFilters or add the failedFilters Boolean too)
Given the first two points is there any point in having any other methods in the Filterable interface other than passedFilter(FilterType) and getFilterStatus?

It would be possible to keep the existing behaviour of passedFilters and add getFilterStatus for ease of use and backwards compatibility at the expense of some potential confusion. But this all depends on what people are using - will any of these changes actually have any direct impact on you? If not I suggest clean logic should be applied as this tends to keeps things simpler and simple is good.

Incorporate Jannovar v0.11 + into Exomiser

This will likely be pretty invasive so needs to ideally be in version 6.
I'll work with Manuel on this.

Changes will impact on anything involving Variant and VariantEvaluation. (core.filters, core.factories)

Add a gene list filter to the submit page

Could also add an option to use some pre-canned lists e.g. the DDD list of developmental genes

Add report feature to flag up variants which Jannovar fails to annotate

Sometimes Jannovar is unable to annotate a variant and throws an exception. This is caught by Exomiser, but the variant is not included in the analysis or results which could lead to incorrect results.

There are two ways this could be handled:

These variants could be flagged and indicated to the user so that they are aware of the issue.
Exomiser should simply stop the analysis and report the reason for the failure.

Votes on behaviour please....

Add data input validation from exomiser/submit on client-side

Bootstrap has some javascript stuff for this - see http://getbootstrap.com/javascript/ for inspiration...

Feedback after hit submit

Make results available in Excel format

Is this a good idea?

Change results page links to Ensembl

Link to Ensembl for variants and genes - we should make an effort to be more compatible with the rest of the campus!

Update intro page

Change first sentence of second paragraph to the below and also combine 1st and 2nd paragraph so all in bold as all equally important

"Variants are prioritized according to user-defined criteria on variant frequency, pathogenicity, quality, inheritance pattern, phenotype data from human and model organisms, and proximity in the interactome to phenotypically similar genes"

Add more informative error message to PhenIX when no HPO terms have been supplied

Child of issue #37
Related to issues #47 and #48

The error message is really confusing. It gives the impression that some files are missing. Maybe we should add a better error. Like: Please insert HPO terms for method phenix.

Or convert the OMIM-ID to HPO-terms.

/home/mschubach/Exomiser/files/exomiser-cli-6.0.0/data/phenix/0.out (No such file or directory)
    at de.charite.compbio.exomiser.core.prioritisers.util.ScoreDistributionContainer.parseDistributions(ScoreDistributionContainer.java:175)
    at de.charite.compbio.exomiser.core.prioritisers.PhenixPriority.<init>(PhenixPriority.java:153)
    at de.charite.compbio.exomiser.core.prioritisers.PriorityFactory.getPhenixPrioritiser(PriorityFactory.java:82)
    at de.charite.compbio.exomiser.core.prioritisers.PriorityFactory.makePrioritisers(PriorityFactory.java:54)
    at de.charite.compbio.exomiser.core.Exomiser.analyse(Exomiser.java:72)
    at de.charite.compbio.exomiser.cli.Main.runAnalysis(Main.java:130)
    at de.charite.compbio.exomiser.cli.Main.main(Main.java:62)

CLI options for --disease-id and --hpo-terms should be mutually exclusive

Spawned from issue #37

--disease-id and --hpo-terms should be mutually exclusive. - Check this doesn't break HiPHIVE.
--disease-id should be converted into a set of HPO terms such that the prioritisers only work off HPO terms as they already do internally. See issue #47.

Filter for genes update

Allow pasting in of large list of genes or upload of a file e.g. the DDD list of developmental disorder genes

remove-off-target-syn option description is confusing

Does specifying this option remove off-target variants, or keep them? And is the default to keep them or remove them? It isn't quite clear.

 -T,--remove-off-target-syn             Keep off-target variants. These
                                        are defined as intergenic,
                                        intronic, upstream, downstream,
                                        synonymous or intronic ncRNA
                                        variants. Default: true

Add ReadTheDocs documentation to GitHub wiki

This is a nice feature in Jannovar - please can you set this up for Exomiser too.

Clean-up package structure

The package structure in exomiser-core is still a bit out of kilter. I want to move everything under the core package so there can be no possibility of namespace clashes with classes from other exomiser jar files.

This will necessitate a full version number increment as it will break any existing code relying on exomiser-core.

The structure should be so:

core
- Exomiser.java
- ExomiserSettings.java
- dao
- factories
- filter(s) (rename?)
- frequency (move into model?)
- model
- pathogenicity (move into model?)
- util (might not be needed)
- writer(s) (rename?)
- io (might not be needed)
- priority (rename to prioritisers)
  - exomewalker
  - inheritance
  - omim
  - ...

Back to query form button

Do we need this instead of users just using the back button? Maybe it is just me.

Exomiser v2 - Show phenotype evidence in results page.

Handle displaying of the phenotype evidence. On current site appears in a pop-up

Vestigial mention of phive-allspecies in cli help

s/phive-allspecies/hiphive/g

-E,--hiphive-params <type>              Comma separated list of optional
                                         parameters for phive-allspecies

Add licence and file headers for Maven Central hosting

Maven (naturally) has a plugin for this. Investigate and then follow through with what's needed to add/update the licence headers in the source code:

http://mojo.codehaus.org/license-maven-plugin

Also, in preparation for fully-opening up the codebase it would be good to add the minimum headers as required by maven Central Repository:
http://central.sonatype.org/pages/requirements.html

Then we can build on TravisCI and publish builds to maven central making it trivially easy for Java developers to use Exomiser.

Remove the evil that is Jannovar.Constants

This is java heresy - an interface used as an enum. Such horrific misuse of the language must be purged with fire.

Running cli jar without any arguments should display help

As per the protocols manuscript, running the jar without any arguments should display the cli help (currently only displayed if -help or --help is provided):

To test whether the installation was successful, run the command
$java -jar exomiser-cli-5.0.1.jar 
If the installation was successful, you will see a help message.

Exomiser v2 - make sure it stays within the memory limits of the dev and production servers

Get running and make sure it stays within the memory limits of the dev and production servers - requires the DataMatrix object to be a singleton. Can we use floats instead of doubles.
Handle displaying of the phenotype evidence. On current site appears in a pop-up

Fix dbSNP parsing

dbSNP seems to have changed its format recently such that alternate alleles appear in the same row with allele frequencies reported for the ref and these alts in order e.g.

9 140777306 rs4422842 C G,T,A . . RS=4422842;RSPOS=140777306;RV;dbSNPBuildID=111;SSR=0;SAO=0;VP=0x050128000a0514012e000100;WGT=1;VC=SNV;PM;PMC;SLO;NSM;REF;ASP;VLD;GNO;KGPhase3;CAF=0.846,.,.,0.154;COMMON=1

This needs to be processed to result in our frequency table
9 140777306 rs4422842 C G .
9 140777306 rs4422842 C T .
9 140777306 rs4422842 C A 0.154

Defaults vs suggestions for fields

Find the grey colouring of the suggested values a bit confusing. Seems like they will be applied as defaults.

Re-work prioritisers to add tests and clean-up API

Download facility from website

If we do this it should be all results though i.e. not just top 200

Automatic integration tests

These are really needed to catch issues which could arise from changes in underlying dependencies such as jannovar and HTSJDK which could have subtle changes to a variant which can cause drastic changes to the outcome of an analysis.

Fail to run --prioritiser phenix

I just pulled the newest development branch and packed it with maven. I also used the actual data from the FTP website.

My command:

java -Xms5g -Xmx5g -jar exomiser-cli-6.0.0.jar \
--prioritiser=exomiser-allspecies -I AR -F 1 -D 607060 \
-v testVCF.vcf -o results/testresult \
--out-format=HTML \
--prioritiser phenix \
-p testPED.ped

The error is:

/home/mschubach/Exomiser/files/exomiser-cli-6.0.0/data/phenix/0.out (No such file or directory)
    at de.charite.compbio.exomiser.core.prioritisers.util.ScoreDistributionContainer.parseDistributions(ScoreDistributionContainer.java:175)
    at de.charite.compbio.exomiser.core.prioritisers.PhenixPriority.<init>(PhenixPriority.java:153)
    at de.charite.compbio.exomiser.core.prioritisers.PriorityFactory.getPhenixPrioritiser(PriorityFactory.java:82)
    at de.charite.compbio.exomiser.core.prioritisers.PriorityFactory.makePrioritisers(PriorityFactory.java:54)
    at de.charite.compbio.exomiser.core.Exomiser.analyse(Exomiser.java:72)
    at de.charite.compbio.exomiser.cli.Main.runAnalysis(Main.java:130)
    at de.charite.compbio.exomiser.cli.Main.main(Main.java:62)

The path is correctly set. /home/mschubach/Exomiser/files/exomiser-cli-6.0.0/data/phenix/1.out exists but not /home/mschubach/Exomiser/files/exomiser-cli-6.0.0/data/phenix/0.out. 0.out is not present in the download file.

Better error handling for large files

Put a warning on the form about VCF files needing to be < 60Mb.

Prioritiser options

Make exomiser v2 the default
Change Exomiser v1 to Exomiser (mouse only)
Change Exomiser v2 to Exomiser (all species)
Do we want to offer Phenix (human only) as well
Do we want to offer ExomeWalker

TSV-writer appears to substitute any 'vcf' in file path

It appears that anywhere the string vcf appears in the path of the output file, it is substituted with genes.tsv, even if it doesn't appear at the end (note that the vcf directory is changed to a genes.tsv directory which doesn't exist):

2015-02-09 02:24:01,429 INFO  de.charite.compbio.exomiser.core.writers.VcfResultsWriter [main] - VCF results written to file /dupa-filer/buske/phenomecentral/geno/vcf/F0000009/F0000009.vcf.
2015-02-09 02:24:01,431 ERROR de.charite.compbio.exomiser.core.writers.TsvGeneResultsWriter [main] - Unable to write results to file /dupa-filer/buske/phenomecentral/geno/genes.tsv/F0000009/F0000009.genes.tsv.
java.nio.file.NoSuchFileException: /dupa-filer/buske/phenomecentral/geno/genes.tsv/F0000009/F0000009.genes.tsv

Add option to filter VCF output to PASS entries

This could simplify the handling of, for example, whole-genome VCF files. We would use this feature in PhenomeCentral. As is, we just have to run the VCF files through a separate AWK step.

Enable filtering of whole-genome VCF files

Can't do this at present as the entire VCF file is read into memory, converted to a Variant and annotated using Jannovar, once this is done the variants are collected into their relevant genes, then filtered.

The VCF parsing, annotation and filtering needs to be streamed into a whole-exon set first, then we can continue on our merry way without requiring tens of gigs of RAM and hours of time.

Replacement of pathogenicity scores with a single CADD score

Add data input validation from exomiser/submit on server-side

Automate build and deploy processes

This is clunky at the moment and relies on a few manual steps to pull everything together and deploy.

This could be achieved by setting up a GO CD server:

http://www.go.cd/download/

need to ask systems for a VM to deploy this to.

Replace hard-coded HTML PriorityScore output with object representation of the data

Start with ExomiserAllSpeciesPriority as this has a huge amount of display logic embedded within the prioritisation logic making the actual algorithm rather hard to see.

TAB delimited output format (tsv) for variants

You wrote in the exomiser draft protocol that there is a TAB delimited file format. Right now there exists one for genes. I think if people using pipelines it will be great to have a TSV-file with the variants and all the annotations (still in the vcf-file annotations are missing).

I can start implementing this feature if it is OK with you.

Enable filtering with no prioritisation

The genes should be scored using just the variant score, i.e., add a gene (phenotype) score of zero to every gene, everything else is the same.

Collapsible pheno-evidence sections

Results get a bit crazy looking when you have a lot of input HPO terms e.g. the classic Pfeiffer example.

Can we make each evidence section collapsible or only show the one with the best score i.e. contributing to the combined Exomiser score. Prefer the former as we may combine all scores eventually

Guarantee that GenomeChange is always non-null.

Can the io.html package be removed now?

This package has now been rendered redundant for the core exomiser-cli functionality as the latest changes in release 5.2.0 are using Thymeleaf to render the HTML and the HtmlWriter simply hands the context with the data it needs to the rendering engine which fills in the resources/html/templates/results.html with this data.

@pnrobinson - Do you still need it for Panel, CRE and Walker? If not then this package has served it's purpose and it's time to retire it from the codebase. Let me know and I'll take care of it.

Make 'none' type Prioritiser the default in SettingsBuilder

This will make using the Exomiser to do the filtering the default and only run the prioritisation step if explicitly specified.

exomiser / exomiser Goto Github PK

exomiser's Issues

Recommend Projects

Recommend Topics

Recommend Org