Coder Social home page Coder Social logo

clinical_trial_data_extractor's Introduction

clinical_trial_data_extractor

Pipelines built on Langchain and Huggingface Transformers for document localization of Pubmed Open Access (PMC) literature.

Introduction

This is a set of small pipelines written over the langchain library and huggingface transformers for unsupervised document localization. Document localization is the problem where we want to find the exact relevant section of a long article that answers a specific query or need.

Process

PMC Document Extraction: We fetch PMC open access articles (using BioC REST API) given a PMC id. We convert them into a simplified JSON format with section headers and titles, and paragraphs (as given by PMC).

Document Feature Generation/Vector Store creation: We use FAISS index through the Langchain library (and a sentence-transformers model) to create a vector store for each paragraph in the article. Using Langchain and the vector store we can then find the most relevant document (using similarity_search) to a query.

The aim is to feed these relevant texts to a Large Language Model such as GPT-3 to create JSON data for very specific requirements. For example, in this extractor my target is to find the arms, procedures, and frequency of procedures for a specific clinical trial. Through the localization module above I can identify the relevant sections of a very large document, whch is then given to GPT to extract the relevant arms, procedures and frequencies.

clinical_trial_data_extractor's People

Contributors

chilicrabcakes avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.