Coder Social home page Coder Social logo

ihp's Introduction

IHP

Framework for identifying Human Phenotype entities

Dependencies and other uses should follow the original ReadMe.

This is a fork created to accomodate an annotator for the Human Phenotype Ontology. It uses Gold Standard Corpora and Test Suites Created by Bio-Lark. Link Here

Usage

If a corpus is to be loaded into IHP, it's necessary to run Stanford CoreNLP.

cd bin/stanford-corenlp-full-2015-12-09/
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 500000 &

Load Corpus (For both Gold Standard Corpora and Test Suite)

   python src/main.py load_corpus --goldstd hpo_train --log DEBUG
   python src/main.py load_corpus --goldstd hpo_test --log DEBUG
   python src/main.py load_corpus --goldstd tsuite --log DEBUG

Train, Test and Evaluate with StanfordNER

   python src/main.py train --goldstd hpo_train --models models/hpo_train --log DEBUG
   python src/main.py test --goldstd hpo_test -o pickle data/results_hpo_train --models models/hpo_train --log DEBUG
   python src/evaluate.py evaluate hpo_test --results data/results_hpo_train --models models/hpo_train --log DEBUG

Train, Test and Evaluate with CRFSuite

   python src/main.py train --goldstd hpo_train --models models/hpo_train --log DEBUG --entitytype hpo --crf crfsuite
   python src/main.py test --goldstd hpo_test -o pickle data/results_hpo_train --models models/hpo_train --log DEBUG --entitytype hpo --crf crfsuite
   python src/evaluate.py evaluate hpo_test --results data/results_hpo_train --models models/hpo_train --log DEBUG --entitytype hpo

Test and Evaluate for Test Suites

   python src/main.py test --goldstd tsuite -o pickle data/results_hpo_train --models models/hpo_train --log DEBUG --entitytype hpo --crf crfsuite
   python src/evaluate.py evaluate tsuite --results data/results_hpo_train --models models/hpo_train --log DEBUG --entitytype hpo 

Rules can be added to the evaluation parameters:

   --rules andor stopwords small_ent twice_validated stopwords gowords posgowords longterms small_len quotes defwords digits lastwords

FAQ

How to run IHP in new, unlabeled, unstructured text?

Replace the sample corpus in corpora/hpo/test_corpus/ by the new, unlabeled, unstructured text and delete the content of corpora/hpo/test_ann/. Then run:

    python src/main.py load_corpus --goldstd hpo_test --log DEBUG
    python src/main.py test --goldstd hpo_test -o pickle data/results_hpo_train --models models/hpo_train --log DEBUG
    python src/evaluate.py evaluate hpo_test --results data/results_hpo_train --models models/hpo_train --log DEBUG

The report file in data/results_hpo_train_report.txt will have the generated annotations marked as false positives (because no annotation file was provided).

References:

  • M. Lobo, A. Lamurias, and F. Couto, “Identifying human phenotype terms by combining machine learning and validation rules,” BioMed Research International, vol. 2017, pp. 1--14, 2017 (https://doi.org/10.1155/2017/8565739)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.