Comments (12)
Pay attention here: The Gene-Pheno extractor still only takes canonical gene names! Before we pull out a random subset from genepheno_relations, maybe update to ALL possible gene names (canonical, noncanonical, RefSeq and Ensembl if I remember correctly)
from dd-genomics.
The canonical thing should be fixed. I'm creating the table now.
from dd-genomics.
maybe also add holdout sets for the other extractors? pure gene, pure pheno, pure variants at least
from dd-genomics.
The following query shows the distribution of supervision and association/causation assessment in the holdout set:
select a.ass_superv, a.cause_superv, a.association, a.causation, count(*)
from
(select distinct
ass.is_correct ass_superv
, cause.is_correct cause_superv
, (ass.expectation > 0.9) association
, (cause.expectation > 0.9) causation
, hs.doc_id, hs.section_id, hs.sent_id, hs.gene_wordidxs, hs.pheno_wordidxs
from
holdout_set hs
left join genepheno_association_is_correct_inference ass
on (hs.doc_id = ass.doc_id
and hs.sent_id = ass.sent_id
and hs.section_id = ass.section_id
and string_to_array(hs.gene_wordidxs, '|~|')::integer[] = ass.gene_wordidxs
and string_to_array(hs.pheno_wordidxs, '|~|')::integer[] = ass.pheno_wordidxs)
left join genepheno_causation_is_correct_inference cause
on (hs.doc_id = cause.doc_id
and hs.sent_id = cause.sent_id
and hs.section_id = cause.section_id
and string_to_array(hs.gene_wordidxs, '|~|')::integer[] = cause.gene_wordidxs
and string_to_array(hs.pheno_wordidxs, '|~|')::integer[] = cause.pheno_wordidxs)
) a
group by a.ass_superv, a.cause_superv, a.association, a.causation
order by a.ass_superv, a.cause_superv, a.association, a.causation;
from dd-genomics.
Current result:
ass_superv | cause_superv | association | causation | count
------------+--------------+-------------+-----------+-------
f | f | f | f | 19
f | f | f | t | 1
f | | f | | 12
t | f | t | f | 16
t | t | t | t | 27
| f | f | f | 29
| f | t | f | 52
| f | | f | 3
| t | t | t | 7
| | f | f | 39
| | t | f | 52
| | t | t | 6
| | | | 237
(13 rows)
from dd-genomics.
Added this query to dashboard
from dd-genomics.
The query for generating the holdout set (with 800 entries now, leaves around 400 that distantly supervised in either causation or association):
create table holdout_set as (select doc_id, section_id, sent_id, gene_wordidxs, pheno_wordidxs from genepheno_pairs order by random() limit 800) distributed by (doc_id, section_id, sent_id);
from dd-genomics.
psql -U jbirgmei -h localhost -p 6432 -c "COPY (SELECT * FROM holdout_set) TO STDOUT E'\t'"genomics_production_2 > onto/manual/genepheno_holdout_set.csv
from dd-genomics.
OK, the holdout set for genepheno stands. Discuss with @ajratner if we want one for gene and pheno only.
from dd-genomics.
We'll discuss when we meet, but at very least, one at a time seems
reasonable in case there are any issues that come up / out of respect for
the unpaid labor :)?
On Mon, Aug 31, 2015 at 5:02 PM Colossus [email protected] wrote:
OK, the holdout set for genepheno stands. Discuss with @ajratner
https://github.com/ajratner if we want one for gene and pheno only.—
Reply to this email directly or view it on GitHub
#47 (comment)
.
from dd-genomics.
This should be a running task from now on. I opt to close it.
from dd-genomics.
I'm gonna close this; first of all I've done quite a bit of labeling and then we're just hiring someone to do it for us
from dd-genomics.
Related Issues (20)
- Find factor weight of gene-to-genepheno factor; experiment with setting to high weight manually
- Joint factors for deciding between multiple gp mentions HOT 3
- Check if OMIM diseases picked up correctly, fix if not
- Evaluate precision for multi-gp sentences only
- Try mixing treedlib and ddlib features HOT 1
- Fix inferred-to-Charite comparison for phenos and diseases
- Fix MindTagger genepheno templates: pheno name in pheno_names table, not postprocess
- Gene name is gene feature: experiment with throwing out word itself
- Word sequence features: "cause" missing between gene and pheno is an (almost) sure sign of failure
- Add full gene names; and remove phenos that are picked up as part of full gene name
- Features dependent on sequence length
- DS rules targeting specific gene names in the presence of certain words
- Remove regex DS rules; they might actually be hurting HOT 13
- Fix onto/make_all.sh HOT 1
- We have "NULL" strings in genepheno_relations HOT 1
- Throwing cancer (and in fact, any pheno) messes up joint factors for multiple candidate resolution
- Do distinct gp-level precision analysis
- Make writing evaluation to results_log optional
- Need linear logic for outer/inner factors
- Add p[0-9][0-9] genes again; use better supervision for those
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dd-genomics.