Coder Social home page Coder Social logo

leanderme / sytora Goto Github PK

View Code? Open in Web Editor NEW
172.0 10.0 92.0 7.2 MB

A sophisticated smart symptom search engine

Home Page: http://sytora.com

Python 2.92% CSS 0.27% HTML 0.34% JavaScript 91.40% Shell 0.21% Makefile 0.18% Jupyter Notebook 4.68%
healthcare healthcare-datasets medical embeddings machine-learning symptom-checker symptomchecker symptoms disease umls

sytora's Introduction

Sytora

Sytora is a multilingual symptom-disease classification app. Translation is managed through the UMLS coding standard. A multinomial Naive Bayes classifier is trained on a handpicked dataset, which is freely available under CC4.0.

To get started:

  • Clone this repo
  • Install requirements
  • Run the scripts (see below) and npm dependencies
  • Get a UMLS license to download UMLS lexica & generate DB (umls.sh)
  • Run and check http://localhost:5001
  • Done! 🎉

search

Check out sytora.com for a demo.

Motivation

Finding the right diagnosis cannot be achieved by extracting symptoms and running a classification algorithm. The hardest part is asking the right questions, focusing what is important in the situation, connecting other events, and much more. Despite all this, I have long been exited about writing a symptom-disease lookup system to quickly gather related symptoms to symptoms etc. Not everything the model outputs is nonsense. Actually it helps a lot to quickly get a list of diseases given to a set of symptoms.

Data

The data is formatted as CSV files. Example entry:

Disease,Symptom
C0162565,C0039239

Data sources:

  • DiseaseSymptomKB.csv: extracted from Disease-Symptom Knowledge Database. This data solely belongs to the respective authors. The authors are not not affiliated with this project.
  • disease-symptom.csv: Manually created by hand. Freely available under CC 4.0.

Install

Training models & generating files from data:

  1. Run cui2vec-converter.py to convert to GloVe-format. You need to get the pretrained embeddings first, available here: https://figshare.com/s/00d69861786cd0156d81. Place them in the data folder.
  2. Run generateLabels.py to create the option labels for the select fields. Languages are currently hardcoded as list and can be extended if needed.
  3. Run train.py to train a MNB classifier (for the disease prediction). Other necessary files are generated, too.
  4. Run relatedSymptoms.py to train the model for the autosuggestion feature. This uses cui2vec. Please note that the authors of cui2vec are not affiliated with this code.

React client: cd into flaskapp and npm install. For development npm run watch, for production npm run build.

Flask Service

A small flask app is avaiable to showcase the trained models. cd into the flaskapp folder and start the app

python app.py

Deployment

Make sure to export REACT_APP_ENDPOINT with the correct address (e.g. http://yoursite.com)

Get going in ~10 min:

sudo apt update
sudo apt install python3-pip python3-dev build-essential libssl-dev libffi-dev python3-setuptools
sudo apt install python-pip python-dev
sudo apt install nodejs npm
pip install flask pandas sklearn numpy
pip install Flask-Limiter flask-expects-json
pip install more-itertools requests configparser
sudo apt-get install nginx supervisor

git clone https://github.com/leanderme/sytora
cd sytora/flaskapp && npm i

vi /etc/supervisor/conf.d/sytora.conf
sudo supervisorctl reread
sudo service supervisor restart
sudo supervisorctl status

sudo vim /etc/nginx/conf.d/virtual.conf
sudo nginx -t
sudo service nginx restart

sytora.conf:

[program:sytora]
directory=/root/sytora/flaskapp
command=gunicorn app:app -b 0.0.0.0:5001
autostart=true
autorestart=true
stderr_logfile=/var/log/sytora/sytora.err.log
stdout_logfile=/var/log/sytora/sytora.out.log

virtual.conf

server {
    listen       80;
    server_name  site.com;

    location / {
        proxy_pass http://127.0.0.1:8000;
    }
}

don't forget to transfer the umls.db, e.g. scp ./umls.db root@address:/root/sytora/flaskapp/umls/database

Coding quality, security & stability

This project was written very quickly with no performance or stability features in mind; the code base suffered accordingly. Expect things to be cleaned up soon though.

Please note that I'm a machine learning hobbyist and a medical student. The code may not in accordance with common conventions.

Acknowledgements

This project is heavily inspired by:

sytora's People

Contributors

leanderme avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sytora's Issues

database problem

I download UMLS and extract all files but i don't know where i should place them and also the correct path when i want to install it.

When i run generateLabels.py file i got unable to open database file error

When i run the python script i got the following error. How to fix this issue ? Please help me.

D:\OfficeWorks\Sytora\sytora>python generateLabels.py
Traceback (most recent call last):
File "generateLabels.py", line 35, in
conn = sqlite3.connect('./flaskapp/databases/umls.db')
sqlite3.OperationalError: unable to open database file

image

i cant find uml.db

conn = sqlite3.connect('./flaskapp/databases/umls.db')
sqlite3.OperationalError: unable to open database file
i cant find any uml.db file

cui2vec-converter.py showing error

Traceback (most recent call last):
File "/home/adwait/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2657, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'CUI'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./cui2vec-converter.py", line 7, in
cui_vecs_df.set_index('CUI', inplace=True)
File "/home/adwait/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 4178, in set_index
level = frame[col]._values
File "/home/adwait/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2927, in getitem
indexer = self.columns.get_loc(key)
File "/home/adwait/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'CUI'
error

Updating to 3.7 and current issues

This was a great system but it's not updated and currently has many issues with gathering the databases. I would love to see this updated and was really hoping to use it for projects in the future. I really do thank you for creating this great system and the implications of it are endless. Please let me know if you intend on updating it or are just abandoning it so I could work on an update myself! If you accept pull requests please let me know so I can contribute!

Umls

Привет. Мне очень жаль, что я тебя опять тревожу. Но это очень важно. Я оформил заявку на получение лицензии UMLS сегодня, ее оформляют в течение пяти дней. Но мне просто необходимо получить архив сегодня или завтра. Это для моей диссертации. В понедельник я должен ее показать. Могу я тебя попросить поделиться со мной архивом umls-2018AB-full.zip с их сайта? Спасибо.

Cannot run cui2vec-converter.py

Traceback (most recent call last):
File "C:\asd\sytora\cui2vec-converter.py", line 6, in
cui_vecs_df = pd.read_csv('./cui2vec_pretrained.csv')
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers.py", line 787, in init
self._make_engine(self.engine)
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Users\Администратор\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers.py", line 1708, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas_libs\parsers.pyx", line 384, in pandas._libs.parsers.TextReader.cinit
File "pandas_libs\parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b'./cui2vec_pretrained.csv' does not exist

Huge description.csv file size?

Hi, a quick qn. When we run generatelabels.py , I see the descriptions.csv is almost 350GB? Not sure how to reduce the file size? Is it dependant on the umls config file? Can you share yours?

app.py gives Exec format error

  • Serving Flask app "app" (lazy loading)
  • Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  • Debug mode: on
    INFO:werkzeug: * Running on http://0.0.0.0:5001/ (Press CTRL+C to quit)
    INFO:werkzeug: * Restarting with stat
    Traceback (most recent call last):
    File "app.py", line 278, in
    app.run(host='0.0.0.0', port=PORT)
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/flask/app.py", line 990, in run
    run_simple(host, port, self, **options)
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/werkzeug/serving.py", line 1007, in run_simple
    run_with_reloader(inner, extra_files, reloader_interval, reloader_type)
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/werkzeug/_reloader.py", line 332, in run_with_reloader
    sys.exit(reloader.restart_with_reloader())
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/werkzeug/_reloader.py", line 176, in restart_with_reloader
    exit_code = subprocess.call(args, env=new_environ, close_fds=False)
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 172, in call
    return Popen(*popenargs, **kwargs).wait()
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 394, in init
    errread, errwrite)
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1047, in _execute_child
    raise child_exception
    OSError: [Errno 8] Exec format error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.