This repository serves to document my undergraduate thesis project.
SehatQA is a web-based answer recommendation system. It uses and is trained on Alodokter question-answer data from 2014-2020. The system performs three tasks for every question input by a user:
This task classifies user input into one or more topics (multi-label classification). For development efficency and data limitation reasons, user question has to be under the 10 topics specified in labels.csv. Classification is performed using a neural network-based model (BiLSTM-CNN and BiGRU-CNN among the high-performing models).
This task selects top 10 most similar questions. Similarity between input question and each dataset question is evaluated using Cosine Similarity. Text is represented as vectors using pretrained word vectors from FastText Bahasa Indonesia.
The answers from each selected similar questions are summarized extractively and presented back to the user as recommended answers.
- Download Python 3.8.10 here
- Install packages from requirements.txt
pip install -r requirements.txt
- Upgrade scikit-learn to v1.1.1
pip install scikit-learn==1.1.1
- Install FastAPI, Uvicorn, Jinja2, and python-multipart to run system locally
pip install fastapi uvicorn jinja2 python-multipart
- Download the Pickle file for FastText Bahasa Indonesia Word Vectors here. Put the file in your local project directory.
- Run the web server locally by running this in terminal
uvicorn main:app --reload