Coder Social home page Coder Social logo

bionlpdatasets's Introduction

BioNLPDatasets

Repo for Bio NLP Resources

Contents

  • Named Entity Recognition
  • Named Entity Normalization
  • Relation Extraction
  • Large Scale Pubmed Corpus

Named Entity Recognition

  • Drug Protein NER

Disease

  • NCBI Disease Corpus: 793 PubMed abstracts 6,892 disease mentions 790 unique disease concepts Medical Subject Headings (MeSH ) Online Mendelian Inheritance in Man (OMIM ) 91% of the mentions map to a single disease concept divided into training, developing and testing sets.

Mutation Mentions of various kinds (Protein, DNA...)

  • tmVAR: tmVar Corpus contains 500 PubMed articles manually annotated with mutation mentions of various kinds.

Chemical Disease Interaction

  • BC5CDR: BC5CDR corpus consists of 1500 PubMed articles with 4409 annotated chemicals, 5818 diseases and 3116 chemical-disease interactions.

  • CDT: The weakly-labeled corpus used in (Peng et al., 2016) consists of 18,410 abstracts and 33,224 CID relations. The raw data was extracted from curated data in the CTD-Pfizer collaboration with document-level annotations of drug-disease and drug-phenotype interactions.

Chemical and Drug

Relation Extraction

Gene-Disease

  • GAD: The Genetic Association Database (GAD) is an archive of human genetic association studies of complex diseases, including summary data extracted from publications on candidate gene and GWAS studies. We use GAD for the development of a corpus on associations between genes and diseases (downloaded on January 21st, 2013).

  • EU-ADR: The corpus has been annotated for drugs, disorders, genes and their inter-relationships. For each of the drug-disorder, drug-target, and target-disorder relations three experts have annotated a set of 100 abstracts.

Chemical-Protein

Protein-Protein

  • PPI: This is a new, and much improved, binarization of BioInfer as reported in Heimonen et al., Complex-to-Pairwise Mapping of Biological Relationships using a Semantic Network Representation.

Drug-Drug Interaction

Drug-ADE

  • ADE: Development of a benchmark corpus to support the automatic extractionof drug-related adverse effects from medical case reports

  • TAC2017: The DDIExtraction2013 Shared Task focuses on extraction of drug-drug interactions.

  • SMM4H: Fourth Social Media Mining for Health (#SMM4H) Shared Task at ACL 2019

  • ADRMine: Corpus from Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts

Large Scale Pubmed Corpus

Pubmed

  • Pubmed Phrases: The dataset contains a collection of 705,915 PubMed Phrases (Kim et al., 2018) that are beneficial for information retrieval and human comprehension.

Useful Links

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.