Coder Social home page Coder Social logo

shubh2016shiv / bio_medical_question_answering_nlp Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 0.0 5.84 MB

Advanced information extraction & retrieval (question-answering) with Bio-Medical dataset using NLP transformers, topic modelling and document search.

Python 3.34% Jupyter Notebook 96.66%
information-extraction information-retrieval nlp streamlit-application topic-modeling transformer

bio_medical_question_answering_nlp's Introduction

Project Title

Bio-Medical and Topic Modelling Question Answering using NLP Transformers

Dataset Description

BioASQ is a large-scale biomedical semantic indexing and question answering (Bio-QA) dataset that contains a collection of factoid questions and their answers related to biomedical literature. It is a benchmark for evaluating the performance of biomedical QA systems, and it is used in the annual BioASQ challenge. The dataset is in JSON format and has the following structure:

title: The title of the dataset.

paragraphs: An array of objects, each object representing a paragraph of biomedical literature and its corresponding question-answer pairs.
|
-> context: A string of text representing the paragraph of biomedical literature.
| 
-> qas: An array of question-answer objects, each containing the following fields:
    |
    -> question: A string of text representing a factoid question.
    |
    -> id: A unique identifier for the question-answer pair.

Example

{
	"version": "BioASQ6b",
	"data": [{
		"title": "BioASQ6b",
		"paragraphs": [{
			"context": "The antibody aducanumab reduces A\u03b2 plaques in Alzheimer's disease. Alzheimer's disease (AD) is characterized by deposition of amyloid-\u03b2 (A\u03b2) plaques and neurofibrillary tangles in the brain, accompanied by synaptic dysfunction and neurodegeneration. Antibody-based immunotherapy against A\u03b2 to trigger its clearance or mitigate its neurotoxicity has so far been unsuccessful. Here we report the generation of aducanumab, a human monoclonal antibody that selectively targets aggregated A\u03b2. In a transgenic mouse model of AD, aducanumab is shown to enter the brain, bind parenchymal A\u03b2, and reduce soluble and insoluble A\u03b2 in a dose-dependent manner. In patients with prodromal or mild AD, one year of monthly intravenous infusions of aducanumab reduces brain A\u03b2 in a dose- and time-dependent manner. This is accompanied by a slowing of clinical decline measured by Clinical Dementia Rating-Sum of Boxes and Mini Mental State Examination scores. The main safety and tolerability findings are amyloid-related imaging abnormalities. These results justify further development of aducanumab for the treatment of AD. Should the slowing of clinical decline be confirmed in ongoing phase 3 clinical trials, it would provide compelling support for the amyloid hypothesis.",
			"qas": [{
				"question": "What disease is the drug aducanumab targeting?",
				"id": "58a95c711978bbde22000001_000"
			}]
		}]
	}]
}

The dataset contains a large collection of factoid questions and their answers, related to biomedical topics. It is a useful benchmark to evaluate the performance of biomedical QA systems. The dataset is provided in JSON format and can be used to train and evaluate machine learning models for answering factoid questions in the biomedical field.

๐Ÿ›  Skills

Pytorch, MongoDB, Python, Streamlit 1.4.0, Spacy 3.2.0, HuggingFace

๐Ÿ”— Links

Working Application Demonstration

Open Application in Streamlit

My LinkedIn Profile

linkedin

Bio-Medical Topics and Their Clusters

BERTopic is used for training and creating cluster on Bio-Medical Corpus after extracting out only NOUN and ADJECTIVES using Spacy NLP library. The Cluster are as follows:

Extracting Disease Entities using NER HuggingFace Pipeline

Code Section for getting Diseases Entities using HuggingFace

Disease Entity recognition and extraction using transformer called Bio-Former with HuggingFace NER pipeline A transformer called Bio-Former is used with NER HuggingFace Pipeline to extract out the Diseases

Extracting Genetic Entities using NER HuggingFace Pipeline

Code Section for getting Genetic Entities using HuggingFace

Genetic Entity recognition and extraction is done using BioBERT Disease NER (biobert_genetic_ner)

Question-Answering using Information Retrieval and Extraction

The Colab Notebook for Question-Answering on Bio-Medical Corpus

Select 'Search Answers based on Question' in Navigation and Give the Bio-Medical related question in text area.

Press ENTER key to trigger the information retriever to retrive top 10 most matching documents related to the question in decreasing order of cosine similarity between question and documents

After Retrieving the best matching documents, the information extraction is automatically initiated to search for the answer for question in the retrived documents. For this purpose, Question-Answering pipeline using HuggingFace transformer - BioBERT Transformer is used.

bio_medical_question_answering_nlp's People

Contributors

shubh2016shiv avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.