Coder Social home page Coder Social logo

dd-genomics's People

Contributors

adamwgoldberg avatar ajratner avatar alldefector avatar amwenger avatar colossus avatar hguturu avatar mpaull avatar netj avatar rionda avatar robinjia avatar thomaspalomares avatar youssefahres avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dd-genomics's Issues

Fix the genes canonical/noncanonical issue

Find out if we use noncanonical gene names.
If we use noncanonical gene names, MARK AS OBSOLETE.
Otherwise, implement noncanonical gene names. Figure out what the issues are! There are a lot of gene names coming in then and there will be clashes with other abbreviations and english words.

Use Trends in Dashboard

E.g. for tracking precision against previously-labeled training set, # of labeled examples, etc

G: Get rid of certain 'bad' gene names

Is this about resurrecting bad_genes.tsv, or (more ideally) are there specific distant supervision rules we can add and/or some systematic errors in our gene-list generation step?

Switch to faster extractors?

We are currently using the tsv_extractor. Should we switch to e.g. plpy or piggy extractor?

@netj any comments on what best practices are for this right now?

Also, more minor: right now we pre-convert postgres arrays to strings (sentences -> sentences_input) is this actually saving any time? Would be cleaner / simpler to do without this. May change if we switch extractors.

GP: Add Multinomial {causation, association, NONE}

Switch everything -> multinomial for GP inference

This includes changing the mindtagger template, the config VALS, the distant supervision rules, application.conf, etc.

We think that this will better structure our efforts and avoid a lot of the confusion / debate around this issue. Basically, we have many examples that meet the "Things Aaron would find interesting" bar but are not "causal", and we would like to retain these in some structured manner

Also, as recap: this is not a philosophical distinction (for the most part) but a methodological one. As example of two extremes:

  • Causative link: "We knocked out X in mice and they got headaches"
  • Associative link: "We performed a GWAS study on a mouse population and found a slight statistical correlation between X and headaches"

List of common bio abbreviations

Get a list of common bio abbreviations (PSD = postsynaptic density; PCR = polymerase chain reaction; ...). Get overlap and figure out if we can create a blacklist or figure out if the abbreviation or the gene name is used.

[deepdive] Implement ScopedMultinomial / non-heuristic ENTITY LINKING...?

The idea here is simple- we want to implement a version of a multinomial variable that allows each individual variable from a template to have scope X, where X is a subset of a shared reference set of values.

One motivating example is entity linking- each candidate phenotype mention would be represented as a ScopedMultinomial variable which had some limited scope (say: False U {a couple possible nearest-neighbor HPO codes}) which would be defined in reference to the full set of HPO codes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.