Coder Social home page Coder Social logo

covid19-textmining's Introduction

COVID-19 Text Mining

Purpose

The CORD-19 dataset is a vast collection of literature on the novel coronavirus. We can apply text and data mining approaches to find answers to questions in the literature in support of the ongoing COVID-19 response efforts worldwide.

What do we know about COVID-19 risk factors?
  • Smoking, pre-existing pulmonary disease
  • Co-infections (determine whether co-existing respiratory/viral infections make the virus more transmissible or virulent) and other co-morbidities
  • Neonates and pregnant women
  • Socio-economic and behavioral factors to understand the economic impact of the virus and whether there were differences.
  • Transmission dynamics of the virus, including the basic reproductive number, incubation period, serial interval, modes of transmission and environmental factors
  • Severity of disease, including risk of fatality among symptomatic hospitalized patients, and high-risk patient groups
  • Susceptibility of populations
  • Public health mitigation measures that could be effective for control

Method

First, the documents on COVID-19 are retrieved using a BM-25 search engine. Then, to find answers to the questions above, two methods are used to find sentences in the papers that talk about those topics.

Method 1:

  1. Create TF-IDF vectors for all sentences from all papers
  2. For a particular Search Query, get the TF-IDF vector.
  3. Find the highest Cosine Similarity between the Search Query and all the sentences from the papers.
  • Pros: Fast and accurate.
  • Cons: Not able to capture semantic relationships between words.

Method 2:

  1. Train Word Embeddings (Word2Vec) on the papers' texts.
  2. For a particular Search Query, get the embedded Word Vectors.
  3. Find the lowest Word Mover's Distance between the Search Query and all the sentences from the papers.
  • Pros: Able to capture semantic relationships between words.
  • Cons: Distance calculations are slow.

Results

Question: Incubation Period - TF-IDF

Question: Incubation Period - WMD


Question: Co-morbidities - TF-IDF

Question: Co-morbidities - WMD


Question: High Risk Group - TF-IDF Question: High Risk Group - WMD


Question: Reproductive Number - TF-IDF Question: Reproductive Number - WMD


Question: Pregant Women - TF-IDF Question: Pregnant Women - WMD


Question: Neonates of Mothers with Covid-19 - TF-IDF Question: Neonates of Mothers with Covid-19 - WMD


covid19-textmining's People

Contributors

tchanda90 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.