Project Title

Bio-Medical and Topic Modelling Question Answering using NLP Transformers

Dataset Description

BioASQ is a large-scale biomedical semantic indexing and question answering (Bio-QA) dataset that contains a collection of factoid questions and their answers related to biomedical literature. It is a benchmark for evaluating the performance of biomedical QA systems, and it is used in the annual BioASQ challenge. The dataset is in JSON format and has the following structure:

title: The title of the dataset.

paragraphs: An array of objects, each object representing a paragraph of biomedical literature and its corresponding question-answer pairs.
|
-> context: A string of text representing the paragraph of biomedical literature.
| 
-> qas: An array of question-answer objects, each containing the following fields:
    |
    -> question: A string of text representing a factoid question.
    |
    -> id: A unique identifier for the question-answer pair.

Example

{
	"version": "BioASQ6b",
	"data": [{
		"title": "BioASQ6b",
		"paragraphs": [{
			"context": "The antibody aducanumab reduces A\u03b2 plaques in Alzheimer's disease. Alzheimer's disease (AD) is characterized by deposition of amyloid-\u03b2 (A\u03b2) plaques and neurofibrillary tangles in the brain, accompanied by synaptic dysfunction and neurodegeneration. Antibody-based immunotherapy against A\u03b2 to trigger its clearance or mitigate its neurotoxicity has so far been unsuccessful. Here we report the generation of aducanumab, a human monoclonal antibody that selectively targets aggregated A\u03b2. In a transgenic mouse model of AD, aducanumab is shown to enter the brain, bind parenchymal A\u03b2, and reduce soluble and insoluble A\u03b2 in a dose-dependent manner. In patients with prodromal or mild AD, one year of monthly intravenous infusions of aducanumab reduces brain A\u03b2 in a dose- and time-dependent manner. This is accompanied by a slowing of clinical decline measured by Clinical Dementia Rating-Sum of Boxes and Mini Mental State Examination scores. The main safety and tolerability findings are amyloid-related imaging abnormalities. These results justify further development of aducanumab for the treatment of AD. Should the slowing of clinical decline be confirmed in ongoing phase 3 clinical trials, it would provide compelling support for the amyloid hypothesis.",
			"qas": [{
				"question": "What disease is the drug aducanumab targeting?",
				"id": "58a95c711978bbde22000001_000"
			}]
		}]
	}]
}

The dataset contains a large collection of factoid questions and their answers, related to biomedical topics. It is a useful benchmark to evaluate the performance of biomedical QA systems. The dataset is provided in JSON format and can be used to train and evaluate machine learning models for answering factoid questions in the biomedical field.

🛠 Skills

Pytorch, MongoDB, Python, Streamlit 1.4.0, Spacy 3.2.0, HuggingFace

🔗 Links

Working Application Demonstration

My LinkedIn Profile

Bio-Medical Topics and Their Clusters

BERTopic is used for training and creating cluster on Bio-Medical Corpus after extracting out only NOUN and ADJECTIVES using Spacy NLP library. The Cluster are as follows:

Extracting Disease Entities using NER HuggingFace Pipeline

Code Section for getting Diseases Entities using HuggingFace

Disease Entity recognition and extraction using transformer called Bio-Former with HuggingFace NER pipeline A transformer called Bio-Former is used with NER HuggingFace Pipeline to extract out the Diseases

Extracting Genetic Entities using NER HuggingFace Pipeline

Code Section for getting Genetic Entities using HuggingFace

Genetic Entity recognition and extraction is done using BioBERT Disease NER (biobert_genetic_ner)

Question-Answering using Information Retrieval and Extraction

The Colab Notebook for Question-Answering on Bio-Medical Corpus

Select 'Search Answers based on Question' in Navigation and Give the Bio-Medical related question in text area.

Press ENTER key to trigger the information retriever to retrive top 10 most matching documents related to the question in decreasing order of cosine similarity between question and documents

After Retrieving the best matching documents, the information extraction is automatically initiated to search for the answer for question in the retrived documents. For this purpose, Question-Answering pipeline using HuggingFace transformer - BioBERT Transformer is used.

shubh2016shiv / bio_medical_question_answering_nlp Goto Github PK

bio_medical_question_answering_nlp's Introduction

Project Title

Dataset Description

🛠 Skills

🔗 Links

Bio-Medical Topics and Their Clusters

Extracting Disease Entities using NER HuggingFace Pipeline

Extracting Genetic Entities using NER HuggingFace Pipeline

Question-Answering using Information Retrieval and Extraction

bio_medical_question_answering_nlp's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent