Coder Social home page Coder Social logo

Comments (12)

Colossus avatar Colossus commented on June 17, 2024

Pay attention here: The Gene-Pheno extractor still only takes canonical gene names! Before we pull out a random subset from genepheno_relations, maybe update to ALL possible gene names (canonical, noncanonical, RefSeq and Ensembl if I remember correctly)

from dd-genomics.

Colossus avatar Colossus commented on June 17, 2024

The canonical thing should be fixed. I'm creating the table now.

from dd-genomics.

Colossus avatar Colossus commented on June 17, 2024

maybe also add holdout sets for the other extractors? pure gene, pure pheno, pure variants at least

from dd-genomics.

Colossus avatar Colossus commented on June 17, 2024

The following query shows the distribution of supervision and association/causation assessment in the holdout set:

select a.ass_superv, a.cause_superv, a.association, a.causation, count(*)
from
(select distinct
  ass.is_correct ass_superv
  , cause.is_correct cause_superv
  , (ass.expectation > 0.9) association
  , (cause.expectation > 0.9) causation
  , hs.doc_id, hs.section_id, hs.sent_id, hs.gene_wordidxs, hs.pheno_wordidxs
from
  holdout_set hs
  left join genepheno_association_is_correct_inference ass
    on (hs.doc_id = ass.doc_id
        and hs.sent_id = ass.sent_id
        and hs.section_id = ass.section_id
        and string_to_array(hs.gene_wordidxs, '|~|')::integer[] = ass.gene_wordidxs
        and string_to_array(hs.pheno_wordidxs, '|~|')::integer[] = ass.pheno_wordidxs)
  left join genepheno_causation_is_correct_inference cause
    on (hs.doc_id = cause.doc_id
        and hs.sent_id = cause.sent_id
        and hs.section_id = cause.section_id
        and string_to_array(hs.gene_wordidxs, '|~|')::integer[] = cause.gene_wordidxs
        and string_to_array(hs.pheno_wordidxs, '|~|')::integer[] = cause.pheno_wordidxs)
) a
group by a.ass_superv, a.cause_superv, a.association, a.causation
order by a.ass_superv, a.cause_superv, a.association, a.causation;

from dd-genomics.

Colossus avatar Colossus commented on June 17, 2024

Current result:

 ass_superv | cause_superv | association | causation | count 
------------+--------------+-------------+-----------+-------
 f          | f            | f           | f         |    19
 f          | f            | f           | t         |     1
 f          |              | f           |           |    12
 t          | f            | t           | f         |    16
 t          | t            | t           | t         |    27
            | f            | f           | f         |    29
            | f            | t           | f         |    52
            | f            |             | f         |     3
            | t            | t           | t         |     7
            |              | f           | f         |    39
            |              | t           | f         |    52
            |              | t           | t         |     6
            |              |             |           |   237
(13 rows)

from dd-genomics.

Colossus avatar Colossus commented on June 17, 2024

Added this query to dashboard

from dd-genomics.

Colossus avatar Colossus commented on June 17, 2024

The query for generating the holdout set (with 800 entries now, leaves around 400 that distantly supervised in either causation or association):

create table holdout_set as (select doc_id, section_id, sent_id, gene_wordidxs, pheno_wordidxs from genepheno_pairs order by random() limit 800) distributed by (doc_id, section_id, sent_id);

from dd-genomics.

Colossus avatar Colossus commented on June 17, 2024

psql -U jbirgmei -h localhost -p 6432 -c "COPY (SELECT * FROM holdout_set) TO STDOUT E'\t'"genomics_production_2 > onto/manual/genepheno_holdout_set.csv

from dd-genomics.

Colossus avatar Colossus commented on June 17, 2024

OK, the holdout set for genepheno stands. Discuss with @ajratner if we want one for gene and pheno only.

from dd-genomics.

ajratner avatar ajratner commented on June 17, 2024

We'll discuss when we meet, but at very least, one at a time seems
reasonable in case there are any issues that come up / out of respect for
the unpaid labor :)?
On Mon, Aug 31, 2015 at 5:02 PM Colossus [email protected] wrote:

OK, the holdout set for genepheno stands. Discuss with @ajratner
https://github.com/ajratner if we want one for gene and pheno only.


Reply to this email directly or view it on GitHub
#47 (comment)
.

from dd-genomics.

Colossus avatar Colossus commented on June 17, 2024

This should be a running task from now on. I opt to close it.

from dd-genomics.

Colossus avatar Colossus commented on June 17, 2024

I'm gonna close this; first of all I've done quite a bit of labeling and then we're just hiring someone to do it for us

from dd-genomics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.