exomiser / exomiser Goto Github PK
View Code? Open in Web Editor NEWA Tool to Annotate and Prioritize Exome Variants
Home Page: https://exomiser.readthedocs.io
License: GNU Affero General Public License v3.0
A Tool to Annotate and Prioritize Exome Variants
Home Page: https://exomiser.readthedocs.io
License: GNU Affero General Public License v3.0
I discussed this earlier with @pnrobinson and he proposed/we agreed on the following (if I remember correctly). I'm adding this issue so we are all on the same page here and for some further sanity checking.
Consider the case of having multiple alternative alleles with REF=A
, ALT=T,G
. In our individuals, we see the following:
A/A A/T T/G
Exomiser will interpret this as two variants:
0/0 0/1 0/1 == wt het het
0/0 0/0 0/1 == wt wt het
It seems that indels are not getting parsed correctly, resulting in the AF and dbSNP lookups to fail.
For example, the VCF file contains:
chr11 61165731 . C CA
This results in the following annotation in the exomiser output: chr11:g.61165731->A
, which is incorrect. It should be g.6116573**2**
The output lists there as being no frequency data, but this is actually rs11382548 with MAF 14%
Not sure how common this is? Can anyone confirm?
Required by Phive, HiPhive and Phenix too.
Currently this is handled by code-duplication.
Can we add in our own Sanger header and footer
Currently the output is already pre-marked-up in HTML so that this is visible directly on the output:
<a href="http://www.omim.org/entry/-10">Craniosynostosis</a>
should look like:
I would suggest having the source base use unix newlines instead of windows newlines. Having the windows newlines makes it harder to work with github, if nothing else. For example, if you try to edit a file (e.g. exomiser-cli/src/main/resources/jdbc.properties) within github, you'll see that the diff is the entire file because the ^M characters at the end of every line get automatically stripped.
It seems that, by default, the jdbc connection will update the h2 database during normal Exomiser runs (presumably just last-accessed fields and whatnot). This causes problems when attempting to access the h2 database concurrently, since this then requires table locking, and ultimately can result in the database entering an inconsistent state with uncommitted changes if an Exomiser run crashes. Once that happens Exomiser can no longer run until you manually go in and drop uncommitted changes in the db.
There is an easy fix that has worked for us. Adding the following to jdbc.properties:
ACCESS_MODE_DATA=r
Since issue #2 the VcfWriter so output is now more compatible with the actual VCF spec, in particular, if a variant has not been filtered the FILTER column should be empty for that variant:
##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GENOTYPE
chr3 4 . G C 2.2 ;VARIANT NOT ANALYSED - NO GENE ANNOTATIONS GT 0/1
chr1 1 . A T 2.2 PASS ;EXOMISER_GENE=ABC1;EXOMISER_VARIANT_SCORE=1.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0 GT 0/1
chr1 2 . T - 2.2 Target ;EXOMISER_GENE=ABC1;EXOMISER_VARIANT_SCORE=0.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0 GT 0/1
chr2 3 . C T 2.2 Frequency;Target ;EXOMISER_GENE=CDE2;EXOMISER_VARIANT_SCORE=0.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0 GT 0/1
chr3 5 . G C 2.2 ;EXOMISER_GENE=CDE2;EXOMISER_VARIANT_SCORE=1.0;EXOMISER_GENE_PHENO_SCORE=0.0;EXOMISER_GENE_VARIANT_SCORE=0.0;EXOMISER_GENE_COMBINED_SCORE=0.0 GT 0/1
However this has raised a question about the Filterable interface which I'd like some feedback on as it impacts a fundamental behaviour. Currently a Filterable either passes or fails, but really it can exist in one of three states - passed, failed and unfiltered/not yet filtered. Given this:
It would be possible to keep the existing behaviour of passedFilters and add getFilterStatus for ease of use and backwards compatibility at the expense of some potential confusion. But this all depends on what people are using - will any of these changes actually have any direct impact on you? If not I suggest clean logic should be applied as this tends to keeps things simpler and simple is good.
This will likely be pretty invasive so needs to ideally be in version 6.
I'll work with Manuel on this.
Changes will impact on anything involving Variant and VariantEvaluation. (core.filters, core.factories)
Could also add an option to use some pre-canned lists e.g. the DDD list of developmental genes
Sometimes Jannovar is unable to annotate a variant and throws an exception. This is caught by Exomiser, but the variant is not included in the analysis or results which could lead to incorrect results.
There are two ways this could be handled:
Votes on behaviour please....
Bootstrap has some javascript stuff for this - see http://getbootstrap.com/javascript/ for inspiration...
Is this a good idea?
Link to Ensembl for variants and genes - we should make an effort to be more compatible with the rest of the campus!
Change first sentence of second paragraph to the below and also combine 1st and 2nd paragraph so all in bold as all equally important
"Variants are prioritized according to user-defined criteria on variant frequency, pathogenicity, quality, inheritance pattern, phenotype data from human and model organisms, and proximity in the interactome to phenotypically similar genes"
Child of issue #37
Related to issues #47 and #48
The error message is really confusing. It gives the impression that some files are missing. Maybe we should add a better error. Like: Please insert HPO terms for method phenix.
Or convert the OMIM-ID to HPO-terms.
/home/mschubach/Exomiser/files/exomiser-cli-6.0.0/data/phenix/0.out (No such file or directory)
at de.charite.compbio.exomiser.core.prioritisers.util.ScoreDistributionContainer.parseDistributions(ScoreDistributionContainer.java:175)
at de.charite.compbio.exomiser.core.prioritisers.PhenixPriority.<init>(PhenixPriority.java:153)
at de.charite.compbio.exomiser.core.prioritisers.PriorityFactory.getPhenixPrioritiser(PriorityFactory.java:82)
at de.charite.compbio.exomiser.core.prioritisers.PriorityFactory.makePrioritisers(PriorityFactory.java:54)
at de.charite.compbio.exomiser.core.Exomiser.analyse(Exomiser.java:72)
at de.charite.compbio.exomiser.cli.Main.runAnalysis(Main.java:130)
at de.charite.compbio.exomiser.cli.Main.main(Main.java:62)
Allow pasting in of large list of genes or upload of a file e.g. the DDD list of developmental disorder genes
Does specifying this option remove off-target variants, or keep them? And is the default to keep them or remove them? It isn't quite clear.
-T,--remove-off-target-syn Keep off-target variants. These
are defined as intergenic,
intronic, upstream, downstream,
synonymous or intronic ncRNA
variants. Default: true
This is a nice feature in Jannovar - please can you set this up for Exomiser too.
The package structure in exomiser-core is still a bit out of kilter. I want to move everything under the core package so there can be no possibility of namespace clashes with classes from other exomiser jar files.
This will necessitate a full version number increment as it will break any existing code relying on exomiser-core.
The structure should be so:
Do we need this instead of users just using the back button? Maybe it is just me.
Handle displaying of the phenotype evidence. On current site appears in a pop-up
s/phive-allspecies/hiphive/g
-E,--hiphive-params <type> Comma separated list of optional
parameters for phive-allspecies
Maven (naturally) has a plugin for this. Investigate and then follow through with what's needed to add/update the licence headers in the source code:
http://mojo.codehaus.org/license-maven-plugin
Also, in preparation for fully-opening up the codebase it would be good to add the minimum headers as required by maven Central Repository:
http://central.sonatype.org/pages/requirements.html
Then we can build on TravisCI and publish builds to maven central making it trivially easy for Java developers to use Exomiser.
This is java heresy - an interface used as an enum. Such horrific misuse of the language must be purged with fire.
As per the protocols manuscript, running the jar without any arguments should display the cli help (currently only displayed if -help
or --help
is provided):
To test whether the installation was successful, run the command
$java -jar exomiser-cli-5.0.1.jar
If the installation was successful, you will see a help message.
dbSNP seems to have changed its format recently such that alternate alleles appear in the same row with allele frequencies reported for the ref and these alts in order e.g.
9 140777306 rs4422842 C G,T,A . . RS=4422842;RSPOS=140777306;RV;dbSNPBuildID=111;SSR=0;SAO=0;VP=0x050128000a0514012e000100;WGT=1;VC=SNV;PM;PMC;SLO;NSM;REF;ASP;VLD;GNO;KGPhase3;CAF=0.846,.,.,0.154;COMMON=1
This needs to be processed to result in our frequency table
9 140777306 rs4422842 C G .
9 140777306 rs4422842 C T .
9 140777306 rs4422842 C A 0.154
Find the grey colouring of the suggested values a bit confusing. Seems like they will be applied as defaults.
If we do this it should be all results though i.e. not just top 200
These are really needed to catch issues which could arise from changes in underlying dependencies such as jannovar and HTSJDK which could have subtle changes to a variant which can cause drastic changes to the outcome of an analysis.
I just pulled the newest development branch and packed it with maven. I also used the actual data from the FTP website.
My command:
java -Xms5g -Xmx5g -jar exomiser-cli-6.0.0.jar \
--prioritiser=exomiser-allspecies -I AR -F 1 -D 607060 \
-v testVCF.vcf -o results/testresult \
--out-format=HTML \
--prioritiser phenix \
-p testPED.ped
The error is:
/home/mschubach/Exomiser/files/exomiser-cli-6.0.0/data/phenix/0.out (No such file or directory)
at de.charite.compbio.exomiser.core.prioritisers.util.ScoreDistributionContainer.parseDistributions(ScoreDistributionContainer.java:175)
at de.charite.compbio.exomiser.core.prioritisers.PhenixPriority.<init>(PhenixPriority.java:153)
at de.charite.compbio.exomiser.core.prioritisers.PriorityFactory.getPhenixPrioritiser(PriorityFactory.java:82)
at de.charite.compbio.exomiser.core.prioritisers.PriorityFactory.makePrioritisers(PriorityFactory.java:54)
at de.charite.compbio.exomiser.core.Exomiser.analyse(Exomiser.java:72)
at de.charite.compbio.exomiser.cli.Main.runAnalysis(Main.java:130)
at de.charite.compbio.exomiser.cli.Main.main(Main.java:62)
The path is correctly set. /home/mschubach/Exomiser/files/exomiser-cli-6.0.0/data/phenix/1.out
exists but not /home/mschubach/Exomiser/files/exomiser-cli-6.0.0/data/phenix/0.out
. 0.out
is not present in the download file.
Put a warning on the form about VCF files needing to be < 60Mb.
It appears that anywhere the string vcf
appears in the path of the output file, it is substituted with genes.tsv
, even if it doesn't appear at the end (note that the vcf
directory is changed to a genes.tsv
directory which doesn't exist):
2015-02-09 02:24:01,429 INFO de.charite.compbio.exomiser.core.writers.VcfResultsWriter [main] - VCF results written to file /dupa-filer/buske/phenomecentral/geno/vcf/F0000009/F0000009.vcf.
2015-02-09 02:24:01,431 ERROR de.charite.compbio.exomiser.core.writers.TsvGeneResultsWriter [main] - Unable to write results to file /dupa-filer/buske/phenomecentral/geno/genes.tsv/F0000009/F0000009.genes.tsv.
java.nio.file.NoSuchFileException: /dupa-filer/buske/phenomecentral/geno/genes.tsv/F0000009/F0000009.genes.tsv
This could simplify the handling of, for example, whole-genome VCF files. We would use this feature in PhenomeCentral. As is, we just have to run the VCF files through a separate AWK step.
Can't do this at present as the entire VCF file is read into memory, converted to a Variant and annotated using Jannovar, once this is done the variants are collected into their relevant genes, then filtered.
The VCF parsing, annotation and filtering needs to be streamed into a whole-exon set first, then we can continue on our merry way without requiring tens of gigs of RAM and hours of time.
This is clunky at the moment and relies on a few manual steps to pull everything together and deploy.
This could be achieved by setting up a GO CD server:
need to ask systems for a VM to deploy this to.
Start with ExomiserAllSpeciesPriority as this has a huge amount of display logic embedded within the prioritisation logic making the actual algorithm rather hard to see.
You wrote in the exomiser draft protocol that there is a TAB delimited file format. Right now there exists one for genes. I think if people using pipelines it will be great to have a TSV-file with the variants and all the annotations (still in the vcf-file annotations are missing).
I can start implementing this feature if it is OK with you.
The genes should be scored using just the variant score, i.e., add a gene (phenotype) score of zero to every gene, everything else is the same.
Results get a bit crazy looking when you have a lot of input HPO terms e.g. the classic Pfeiffer example.
Can we make each evidence section collapsible or only show the one with the best score i.e. contributing to the combined Exomiser score. Prefer the former as we may combine all scores eventually
This package has now been rendered redundant for the core exomiser-cli functionality as the latest changes in release 5.2.0 are using Thymeleaf to render the HTML and the HtmlWriter simply hands the context with the data it needs to the rendering engine which fills in the resources/html/templates/results.html with this data.
@pnrobinson - Do you still need it for Panel, CRE and Walker? If not then this package has served it's purpose and it's time to retire it from the codebase. Let me know and I'll take care of it.
This will make using the Exomiser to do the filtering the default and only run the prioritisation step if explicitly specified.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.