Coder Social home page Coder Social logo

Comments (13)

ajratner avatar ajratner commented on September 27, 2024

If you remove these, could you perhaps just comment them out so they're
easy to put back in (e.g. for data-programming testing :) )?

Also, curious to hear more detailed thoughts on this in person

On Thu, Feb 11, 2016 at 9:53 PM Colossus [email protected] wrote:


Reply to this email directly or view it on GitHub
#346.

from dd-genomics.

Colossus avatar Colossus commented on September 27, 2024

Yeah I just commented them

from dd-genomics.

chrismre avatar chrismre commented on September 27, 2024

How are we doing in quality =)

On Thu, Feb 11, 2016 at 9:56 PM Colossus [email protected] wrote:

Yeah I just commented them


Reply to this email directly or view it on GitHub
#346 (comment)
.

from dd-genomics.

Colossus avatar Colossus commented on September 27, 2024

Basically I think they were good initially, but at this point our SV set is so good that these rules pick up too much, in particular when it comes to picking out wrong pairs.

from dd-genomics.

ajratner avatar ajratner commented on September 27, 2024

Awesome thanks!

Okay this will be cool to look into further, is interesting!

On Thu, Feb 11, 2016 at 9:56 PM Colossus [email protected] wrote:

Yeah I just commented them


Reply to this email directly or view it on GitHub
#346 (comment)
.

from dd-genomics.

Colossus avatar Colossus commented on September 27, 2024

90% precision as usual and 30% recall, with the big fight against picking out wrong pairs in progress. The current weapons of choice, all of which are being evaluated, are:

  • Hand-crafted factors to lower likelihood of extracting certain wrong pairs based on known patterns (e.g.: GP-GP pattern; introduce negative factor for middle P-G)
  • Mixing ddlib and treedlib; with a bunch of novel features in treedlib targeting the word sequence between gene and pheno
  • hopefully (suggestion to Alex) creating classes of treedlib features based on the length of the word sequence/dep path between gene and pheno

from dd-genomics.

Colossus avatar Colossus commented on September 27, 2024

We had a bunch of trouble with the expressivity of DDLib factors, but all of these have been fixed by Feiran and Jaeho

from dd-genomics.

ajratner avatar ajratner commented on September 27, 2024

Generated some good new ideas for treedlib today which I'll be putting in
(at least to have option of experimenting with) soon :)

On Thu, Feb 11, 2016 at 9:59 PM Colossus [email protected] wrote:

90% precision as usual and 30% recall, with the big fight against picking
out wrong pairs in progress. The current weapons of choice, all of which
are being evaluated, are:

  • Hand-crafted factors to lower likelihood of extracting certain wrong
    pairs based on known patterns (e.g.: GP-GP pattern; introduce negative
    factor for middle P-G)
  • Mixing ddlib and treedlib; with a bunch of novel features in
    treedlib targeting the word sequence between gene and pheno
  • hopefully (suggestion to Alex) creating classes of treedlib features
    based on the length of the word sequence/dep path between gene and pheno


Reply to this email directly or view it on GitHub
#346 (comment)
.

from dd-genomics.

Colossus avatar Colossus commented on September 27, 2024

so I just compared ddlib pure VS ddlib for single-relationship-candidate sentences and treedlib for multi-relationship-candidate sentences and it appears there is hardly any difference in precision and recall unfortunately (runs on different sets produce slightly different results, but none of them seem decisively better) ... Based on the results, however, I'm going with ddlib for single-relationship and treedlib for multi-relationship (treedlib DID seem to do slightly better there)

I'm still hoping for the

  • word sequence features, which I'm sure would get us rid of one class of "wrong-pair-errors" (which is the class where there's simply no "good" word in between gene and pheno mentio)
  • and hand-crafted factors, which will beat sentences that list a bunch of GP mentions in a row

from dd-genomics.

ajratner avatar ajratner commented on September 27, 2024

For your first bullet point, I'm assuming you mean 'where there's no good
word on the dep path between', as per our convo today?

Either way, will have some stuff soon in treedlib that will hopefully help
here- will keep you updated!

On Thu, Feb 11, 2016 at 10:04 PM Colossus [email protected] wrote:

so I just compared ddlib pure VS ddlib for single-relationship-candidate
sentences and treedlib for multi-relationship-candidate sentences and it
appears there is hardly any difference in precision and recall
unfortunately (runs on different sets produce slightly different results,
but none of them seem decisively better) ... Based on the results, however,
I'm going with ddlib for single-relationship and treedlib for
multi-relationship (treedlib DID seem to do slightly better there)

I'm still hoping for the

  • word sequence features, which I'm sure would get us rid of one class
    of "wrong-pair-errors" (which is the class where there's simply no "good"
    word in between gene and pheno mentio)
  • and hand-crafted factors, which will beat sentences that list a
    bunch of GP mentions in a row


Reply to this email directly or view it on GitHub
#346 (comment)
.

from dd-genomics.

Colossus avatar Colossus commented on September 27, 2024

well not exactly, I mean "where there's no good word in the word sequence in between"

from dd-genomics.

Colossus avatar Colossus commented on September 27, 2024

Psalm 45:13

from dd-genomics.

ajratner avatar ajratner commented on September 27, 2024

Ok & lol

On Thu, Feb 11, 2016 at 10:08 PM Colossus [email protected] wrote:

Psalm 45:13


Reply to this email directly or view it on GitHub
#346 (comment)
.

from dd-genomics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.