Coder Social home page Coder Social logo

bio-epidemiology-ner's Introduction

Bio-Epidemiology-NER is an Python library built on top of biomedical-ner-all model to recognize bio-medical entities from a corpus or a medical report

Downloads CI CI CI

Feature Output
Named Entity Recognition Recognize 84 bio-medical entities
PDF Input Read Pdf and tabulate the entities
PDF Annotation Annotate Entities in a medical pdf report

Tutorial

Installation

Use the package manager pip to install Bio-Epidemiology-NER

pip install Bio-Epidemiology-NER

This package has dependency over Pytorch, please install the required configuration from this link https://pytorch.org/get-started/locally/

Usage

NER with Bio-Epidemiology-NER

# load all the functions
from Bio_Epidemiology_NER.bio_recognizer import ner_prediction

# returns the predicted class along with the probability of the actual EnvBert model
doc = """
	CASE: A 28-year-old previously healthy man presented with a 6-week history of palpitations. 
      The symptoms occurred during rest, 2–3 times per week, lasted up to 30 minutes at a time 
      and were associated with dyspnea. Except for a grade 2/6 holosystolic tricuspid regurgitation 
      murmur (best heard at the left sternal border with inspiratory accentuation), physical 
      examination yielded unremarkable findings.
      """

# returns a dataframe output
ner_prediction(corpus=doc, compute='cpu') #pass compute='gpu' if using gpu

Annotate the entities in a Medical Report and export as pdf/csv format

# load all the functions
from Bio_Epidemiology_NER.bio_recognizer import pdf_annotate

# enter pdf file name
pdffile = 'Alhashash-2020-Emergency surgical management.pdf'

# returns a annotated pdf file
pdf_annotate(pdffile,compute='cpu', output_format='pdf') #pass compute='gpu' if using gpu

# returns a csv file with entities
pdf_annotate(pdffile,compute='cpu', output_format='csv') #pass compute='gpu' if using gpu

# return both annotated pdf and csv file
pdf_annotate(pdffile,compute='cpu', output_format='all') #pass compute='gpu' if using gpu

About the Model

The model within this package is an English Named Entity Recognition model, trained on Maccrobat to recognize the bio-medical entities (84 entities) from a given text corpus (case reports etc.). This model was built on top of distilbert-base-uncased

for more details regarding the entities supported, check the config file https://huggingface.co/d4data/biomedical-ner-all/blob/main/config.json

Ownership & License

This Package is part of the Research topic "AI in Biomedical field" conducted by Deepak John Reji, Shaina Raza. If you use this work (code, model or dataset),

Please cite our Research Paper

and star at: https://github.com/dreji18/biomedicalNER

MIT License

You can support me :)

Buy Me A Coffee

bio-epidemiology-ner's People

Contributors

dreji18 avatar shainaraza avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

bio-epidemiology-ner's Issues

Repurpose this model for other domains

Need help to understand how can I repurpose this model or start with distilbert to recognise certain phrases in call recording transcript (in call center) like "Thank you for calling XX [Your call may be record (disclaimer)]" phrase "Your call may be recorded" is a disclaimer.

Memory leaks

Hey! Great work with the Bio-EN model. Getting 41 entities out of any medical/clinical document is truly amazing!
But there is one observation that I made.
The memory consumed by the model to NER a document increases after every run. Is this memory leak a known issue?
Is there some modification that can be done in the model to avoid this??

Thanks

Instiallation failure with pip macOS Ventura 13.1 (22C65) - VS CODE bash

PyMuPDF/setup.py: extra_link_args=[]
running bdist_wheel
running build
running build_py
running build_ext
building 'fitz._fitz' extension
swigging fitz/fitz.i to fitz/fitz_wrap.c
swig -python -o fitz/fitz_wrap.c fitz/fitz.i
error: command 'swig' failed: No such file or directory
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for PyMuPDF
Running setup.py clean for PyMuPDF
Failed to build PyMuPDF
Installing collected packages: PyMuPDF, Bio-Epidemiology-NER
Running setup.py install for PyMuPDF ... error
error: subprocess-exited-with-error

× Running setup.py install for PyMuPDF did not run successfully.
│ exit code: 1
╰─> [57 lines of output]
PyMuPDF/setup.py: sys.argv: ['/private/var/folders/qh/mr12tldn3g76pnlklr517bsm0000gn/T/pip-install-6t5y85b5/pymupdf_089caf4c54124decbecd856b9651083b/setup.py', 'install', '--record', '/private/var/folders/qh/mr12tldn3g76pnlklr517bsm0000gn/T/pip-record-g_og8vnk/install-record.txt', '--single-version-externally-managed', '--compile', '--install-headers', '/opt/homebrew/Caskroom/miniforge/base/include/python3.10/PyMuPDF']
PyMuPDF/setup.py: os.getcwd(): /private/var/folders/qh/mr12tldn3g76pnlklr517bsm0000gn/T/pip-install-6t5y85b5/pymupdf_089caf4c54124decbecd856b9651083b
PyMuPDF/setup.py: file: /private/var/folders/qh/mr12tldn3g76pnlklr517bsm0000gn/T/pip-install-6t5y85b5/pymupdf_089caf4c54124decbecd856b9651083b/setup.py
PyMuPDF/setup.py: $PYTHON_ARCH: None
PyMuPDF/setup.py: os.environ (42):
PyMuPDF/setup.py: MANPATH: /opt/homebrew/share/man:
PyMuPDF/setup.py: TERM_PROGRAM: vscode
PyMuPDF/setup.py: TERM: xterm-256color
PyMuPDF/setup.py: SHELL: /bin/bash
PyMuPDF/setup.py: HOMEBREW_REPOSITORY: /opt/homebrew
PyMuPDF/setup.py: TMPDIR: /var/folders/qh/mr12tldn3g76pnlklr517bsm0000gn/T/
PyMuPDF/setup.py: CONDA_SHLVL: 1
PyMuPDF/setup.py: CONDA_PROMPT_MODIFIER: (base)
PyMuPDF/setup.py: TERM_PROGRAM_VERSION: 1.74.2
PyMuPDF/setup.py: ORIGINAL_XDG_CURRENT_DESKTOP: undefined
PyMuPDF/setup.py: MallocNanoZone: 0
PyMuPDF/setup.py: USER: paritoshmacmini
PyMuPDF/setup.py: COMMAND_MODE: unix2003
PyMuPDF/setup.py: CONDA_EXE: /opt/homebrew/Caskroom/miniforge/base/bin/conda
PyMuPDF/setup.py: SSH_AUTH_SOCK: /private/tmp/com.apple.launchd.O6empSllv6/Listeners
PyMuPDF/setup.py: __CF_USER_TEXT_ENCODING: 0x1F5:0x0:0x0
PyMuPDF/setup.py: _CE_CONDA:
PyMuPDF/setup.py: PATH: /opt/homebrew/Caskroom/miniforge/base/bin:/opt/homebrew/Caskroom/miniforge/base/condabin:/opt/homebrew/opt/curl/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/opt/curl/bin:/usr/local/opt/[email protected]/bin:/usr/local/opt/[email protected]/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/Apple/usr/bin
PyMuPDF/setup.py: CONDA_PREFIX: /opt/homebrew/Caskroom/miniforge/base
PyMuPDF/setup.py: __CFBundleIdentifier: com.microsoft.VSCode
PyMuPDF/setup.py: PWD: /Users/paritoshmacmini/Documents/antiagingintegratedinformationsystem/antiaging
PyMuPDF/setup.py: LANG: en_US.UTF-8
PyMuPDF/setup.py: VSCODE_GIT_ASKPASS_EXTRA_ARGS: --ms-enable-electron-run-as-node
PyMuPDF/setup.py: XPC_FLAGS: 0x0
PyMuPDF/setup.py: _CE_M:
PyMuPDF/setup.py: XPC_SERVICE_NAME: 0
PyMuPDF/setup.py: SHLVL: 1
PyMuPDF/setup.py: HOME: /Users/paritoshmacmini
PyMuPDF/setup.py: VSCODE_GIT_ASKPASS_MAIN: /Applications/Visual Studio Code.app/Contents/Resources/app/extensions/git/dist/askpass-main.js
PyMuPDF/setup.py: HOMEBREW_PREFIX: /opt/homebrew
PyMuPDF/setup.py: DYLD_LIBRARY_PATH: :/opt/local/lib
PyMuPDF/setup.py: CONDA_PYTHON_EXE: /opt/homebrew/Caskroom/miniforge/base/bin/python
PyMuPDF/setup.py: LOGNAME: paritoshmacmini
PyMuPDF/setup.py: VSCODE_GIT_IPC_HANDLE: /var/folders/qh/mr12tldn3g76pnlklr517bsm0000gn/T/vscode-git-3acd473be0.sock
PyMuPDF/setup.py: CONDA_DEFAULT_ENV: base
PyMuPDF/setup.py: INFOPATH: /opt/homebrew/share/info:
PyMuPDF/setup.py: HOMEBREW_CELLAR: /opt/homebrew/Cellar
PyMuPDF/setup.py: VSCODE_GIT_ASKPASS_NODE: /Applications/Visual Studio Code.app/Contents/Frameworks/Code Helper (Plugin).app/Contents/MacOS/Code Helper (Plugin)
PyMuPDF/setup.py: GIT_ASKPASS: /Applications/Visual Studio Code.app/Contents/Resources/app/extensions/git/dist/askpass.sh
PyMuPDF/setup.py: COLORTERM: truecolor
PyMuPDF/setup.py: _: /opt/homebrew/Caskroom/miniforge/base/bin/pip
PyMuPDF/setup.py: PIP_BUILD_TRACKER: /private/var/folders/qh/mr12tldn3g76pnlklr517bsm0000gn/T/pip-build-tracker-ym6c8s6t
running install
/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running build
running build_py
running build_ext
building 'fitz._fitz' extension
swigging fitz/fitz.i to fitz/fitz_wrap.c
swig -python -o fitz/fitz_wrap.c fitz/fitz.i
error: command 'swig' failed: No such file or directory
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> PyMuPDF

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Odd entity extraction with really high score

Hey Deepak, really amazing work here, using it a lot, thank you.

I'm trying to understand why am I getting some really odd entity-value pairs with really high score:

image

image

Also if you need any help with this, happy to help.

DataFrame object has no attribute append

When I try to run the example using the Python package, I get the following error:

AttributeError: 'DataFrame' object has no attribute 'append'

I think it is caused because append was removed from pandas and now we need to use concat instead.

Breaking word while labelling

Hey,
Kudos for the amazing work on biomedical ner. Really awesome how good it is. But sometimes it breaks a word into multiple tokens and labels them which is kinda weird. Can we stop the model from doing that?

eg :

{
    "entity_group": "Administration",
    "score": 0.46949705481529236,
    "word": "thor",
    "start": 424,
    "end": 428
  },
  {
    "entity_group": "Medication",
    "score": 0.7422544360160828,
    "word": "##ugh",
    "start": 428,
    "end": 431
  }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.