Coder Social home page Coder Social logo

j-gann / medical-rag-chatbot Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 3.12 MB

Project Repository for the class "Natural Language Processing with Transformers" 2023 at Heidelberg University

Jupyter Notebook 61.10% Python 1.01% HTML 0.21% JavaScript 0.34% Dockerfile 0.11% Shell 0.11% TypeScript 19.81% Svelte 17.23% CSS 0.08%

medical-rag-chatbot's Introduction

Medical Chatbot using finetuned LLM and RAG on Pubmed dataset

GitHub Handle E-Mail Course of Study Matriculation Number
Jonas Gann @J-Gann [email protected] Data and Computer Science 3367576
Christian Teutsch @chTeut [email protected] Data and Computer Science 3729420
Saif Mandour @saifmandour [email protected] Computer Science 4189231

Advisor: Robin Khanna

This repository contains a medical chatbot using a finetuned LLM and RAG system on a Pubmed dataset.

See the Documentation for more information.

Installation and Running

Requirements

Setup

  • pull the used llm using ollama pull cniongolo/biomistral

  • pull chat-ui dependencies using cd chat-ui-rag && npm i

  • pull the dataset

Starting

  • start pinecone vectorstore using python3 chat-ui-rag/src/lib/server/rag/pinecone/pineconeEndpoint.py

  • as alternative to pinecone: start OpenSearch using docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=qaOllama2" opensearchproject/opensearch:latest. Change "vectorStoreType" to "opensearch" at .env

  • start mongodb server using docker run -d -p 27017:27017 --name mongo-chatui mongo:latest

  • start chat-ui dev server using cd chat-ui-rag && npm run dev

  • to start production chat-ui server run cd chat-ui-rag && npm run build && pm2 start ecosystem.config.cjs

Access

Repository Overview

The main components of the repository are:

User Interface: ChatUI1

Here you can find the user interface for the chatbot based on the open-source project. We expanded the project to include a RAG system inserting the content of scientific papers retrieved from the Pubmed dataset.

RAG System: RAG

Here you can find the RAG system used for the chatbot. It provides an endpoint for the chat-ui to query the RAG system for papers relevant to questions posed by the user. The retrieved papers are inserted into the user prompt at buildPrompt.ts.

Data Preprocessing: Preprocessing

This notebook containes the code we used to retrieve and process the Pubmed dataset as well as upload embeddings of the papers to the Pinecone vectorstore.

System Evaluation: Evaluation

This folder contains the code and results of the evaluation of the chatbot system.

Meetings: Meetings

This folder contains the notes of the meetings we had during the project.

Notes: Notes

This folder contains the notes we took during the project.

Opensearch: Opensearch

The OpenSearch Vectorbase runs on a localhost. The pubmed_preprocessing.ipynb notebook can be used to preprocess the PubMed data, creating an index and bulk loading the data into the Vectorbase. It also provides an index mapping to create a k-NN search. The k-NN can be tested in the last code section. This code is also used in the opensearchEndpoint.py. To use OpenSearch instead of Pinceone Vector Database, the .env file must be modified. The Rag attribute in the MODELS variable must be changed to "vectorStoreType": "opensearch" and "url": "http://127.0.0.1:9300".

Footnotes

  1. https://github.com/huggingface/chat-ui โ†ฉ

medical-rag-chatbot's People

Contributors

chteut avatar j-gann avatar saifmandour avatar

Watchers

 avatar

medical-rag-chatbot's Issues

Choose the dataset we want to use for the project

Key metrics we defined are:

  • Ease of downloading the Data (collect resources)
  • Creating a structure of the downloaded dataset (consindering Opensearch)
  • Performance evaluation of the model
  • Understanding the format of the articles and amount of preprocessing necessary to work with data.

Design system for document retrieval

Think about how the system should be designed which retrieves a list of relevant documents for a given question / query.
=> Define architecture for later implementation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.