Comments (13)
If you remove these, could you perhaps just comment them out so they're
easy to put back in (e.g. for data-programming testing :) )?
Also, curious to hear more detailed thoughts on this in person
On Thu, Feb 11, 2016 at 9:53 PM Colossus [email protected] wrote:
—
Reply to this email directly or view it on GitHub
#346.
from dd-genomics.
Yeah I just commented them
from dd-genomics.
How are we doing in quality =)
On Thu, Feb 11, 2016 at 9:56 PM Colossus [email protected] wrote:
Yeah I just commented them
—
Reply to this email directly or view it on GitHub
#346 (comment)
.
from dd-genomics.
Basically I think they were good initially, but at this point our SV set is so good that these rules pick up too much, in particular when it comes to picking out wrong pairs.
from dd-genomics.
Awesome thanks!
Okay this will be cool to look into further, is interesting!
On Thu, Feb 11, 2016 at 9:56 PM Colossus [email protected] wrote:
Yeah I just commented them
—
Reply to this email directly or view it on GitHub
#346 (comment)
.
from dd-genomics.
90% precision as usual and 30% recall, with the big fight against picking out wrong pairs in progress. The current weapons of choice, all of which are being evaluated, are:
- Hand-crafted factors to lower likelihood of extracting certain wrong pairs based on known patterns (e.g.: GP-GP pattern; introduce negative factor for middle P-G)
- Mixing ddlib and treedlib; with a bunch of novel features in treedlib targeting the word sequence between gene and pheno
- hopefully (suggestion to Alex) creating classes of treedlib features based on the length of the word sequence/dep path between gene and pheno
from dd-genomics.
We had a bunch of trouble with the expressivity of DDLib factors, but all of these have been fixed by Feiran and Jaeho
from dd-genomics.
Generated some good new ideas for treedlib today which I'll be putting in
(at least to have option of experimenting with) soon :)
On Thu, Feb 11, 2016 at 9:59 PM Colossus [email protected] wrote:
90% precision as usual and 30% recall, with the big fight against picking
out wrong pairs in progress. The current weapons of choice, all of which
are being evaluated, are:
- Hand-crafted factors to lower likelihood of extracting certain wrong
pairs based on known patterns (e.g.: GP-GP pattern; introduce negative
factor for middle P-G)- Mixing ddlib and treedlib; with a bunch of novel features in
treedlib targeting the word sequence between gene and pheno- hopefully (suggestion to Alex) creating classes of treedlib features
based on the length of the word sequence/dep path between gene and pheno—
Reply to this email directly or view it on GitHub
#346 (comment)
.
from dd-genomics.
so I just compared ddlib pure VS ddlib for single-relationship-candidate sentences and treedlib for multi-relationship-candidate sentences and it appears there is hardly any difference in precision and recall unfortunately (runs on different sets produce slightly different results, but none of them seem decisively better) ... Based on the results, however, I'm going with ddlib for single-relationship and treedlib for multi-relationship (treedlib DID seem to do slightly better there)
I'm still hoping for the
- word sequence features, which I'm sure would get us rid of one class of "wrong-pair-errors" (which is the class where there's simply no "good" word in between gene and pheno mentio)
- and hand-crafted factors, which will beat sentences that list a bunch of GP mentions in a row
from dd-genomics.
For your first bullet point, I'm assuming you mean 'where there's no good
word on the dep path between', as per our convo today?
Either way, will have some stuff soon in treedlib that will hopefully help
here- will keep you updated!
On Thu, Feb 11, 2016 at 10:04 PM Colossus [email protected] wrote:
so I just compared ddlib pure VS ddlib for single-relationship-candidate
sentences and treedlib for multi-relationship-candidate sentences and it
appears there is hardly any difference in precision and recall
unfortunately (runs on different sets produce slightly different results,
but none of them seem decisively better) ... Based on the results, however,
I'm going with ddlib for single-relationship and treedlib for
multi-relationship (treedlib DID seem to do slightly better there)I'm still hoping for the
- word sequence features, which I'm sure would get us rid of one class
of "wrong-pair-errors" (which is the class where there's simply no "good"
word in between gene and pheno mentio)- and hand-crafted factors, which will beat sentences that list a
bunch of GP mentions in a row—
Reply to this email directly or view it on GitHub
#346 (comment)
.
from dd-genomics.
well not exactly, I mean "where there's no good word in the word sequence in between"
from dd-genomics.
Psalm 45:13
from dd-genomics.
Ok & lol
On Thu, Feb 11, 2016 at 10:08 PM Colossus [email protected] wrote:
Psalm 45:13
—
Reply to this email directly or view it on GitHub
#346 (comment)
.
from dd-genomics.
Related Issues (20)
- Find factor weight of gene-to-genepheno factor; experiment with setting to high weight manually
- Joint factors for deciding between multiple gp mentions HOT 3
- Check if OMIM diseases picked up correctly, fix if not
- Evaluate precision for multi-gp sentences only
- Try mixing treedlib and ddlib features HOT 1
- Fix inferred-to-Charite comparison for phenos and diseases
- Fix MindTagger genepheno templates: pheno name in pheno_names table, not postprocess
- Gene name is gene feature: experiment with throwing out word itself
- Word sequence features: "cause" missing between gene and pheno is an (almost) sure sign of failure
- Add full gene names; and remove phenos that are picked up as part of full gene name
- Features dependent on sequence length
- DS rules targeting specific gene names in the presence of certain words
- Fix onto/make_all.sh HOT 1
- We have "NULL" strings in genepheno_relations HOT 1
- Throwing cancer (and in fact, any pheno) messes up joint factors for multiple candidate resolution
- Do distinct gp-level precision analysis
- Make writing evaluation to results_log optional
- Need linear logic for outer/inner factors
- Add p[0-9][0-9] genes again; use better supervision for those
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dd-genomics.