Coder Social home page Coder Social logo

ramongsilva / information-extraction-from-pubmed-abstracts-sentences-on-polyphenols-anticancer-activity Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 35.69 MB

This repository contains files and information about step 2 of Kaphta Architecture: Information Extraction, using the R language.

R 100.00%
information-extraction machine-learning named-entity-recognition association-recognition text-mining text-classification rule-based pubmed-abstracts ner sentences-recognized

information-extraction-from-pubmed-abstracts-sentences-on-polyphenols-anticancer-activity's Introduction

Information extraction from PubMed abstracts sentences on polyphenols anticancer activity

This repository contains files and information about step 2 of Kaphta Architecture: Information Extraction. In this stage, PubMed abstracts classified as positive in the previous stage (Text Classification step) were used to extract information. Information was extracted from sentences of PubMed abstracts with associations of recognized entities. The following are the files used in the tasks of NER (Named entity recognition), AR (Association recognition) and your respective results:

For more information about this and other steps of the Kaphta Architecture, see sections of the Kaptha Web Tool available in https://portal.ifsuldeminas.edu.br/kaphtawebtool/.

NER (Named entity recognition)

  • ner-pubmed-abstracts-gh.R: R script for named entity recognition (NER) in PubMed abstracts classified as positive in the previous stage (Text Classification step), using PubTator API
  • functions.R: R script with auxiliary functions. Save this file in the same folder of ner-pubmed-abstracts-gh.R and association-recognition-pubmed-abstracts-gh.R scripts, because it is needed to execute these scripts.
  • db_total_project.db: SQLite Database needed to execute all R scripts of kaphta architecture steps. This database contains tables with the Entity dictionary, Total PubMed abstracts textual corpus, and Pubmed abstracts classified as positive in text classification. Save this file in the same folder of ner-pubmed-abstracts-gh.R script, because it is needed to execute this script.

AR (Association recognition)

Results of the NER and AR tasks

  • entities-recognized: folder with files resulted from NER task in information extraction with the named entities (polyphenols, cancers and genes) recognized on PubMed abstracts classified as positive in the previous stage (Text Classification step). Save this folder with the files in the same folder of association-recognition-pubmed-abstracts-gh.R script, because it is needed to execute this script, on the Association recognition task.
  • entities-associations-sentences-recognized: folder with files resulted of NER task in information extraction with sentences recognized with entities (polyphenols, cancers and genes) associations on PubMed abstracts classified as positive in the previous stage (Text Classification step). Save this folder with the files in the same folder of association-recognition-pubmed-abstracts-gh.R script, because it is needed to execute this script, on the Association recognition task.
  • ner-frequency: folder with files with the frequency of entities about polyphenols, cancers and/or genes recognized in PubMed abstracts classified as positive in the previous stage (Text Classification step).
  • Rule_associations_recognized.rar: compacted file resulted of AR task containing the PubMed abstract sentences with at least one rule from rules dictionary recognized.

Result of AR task

Below is presented a table with the results of the Association Recognition task, separated for category, rules and sentence type (PC, PG, and P).

Table with the total of the recognized sentences associations for the different sentence type

information-extraction-from-pubmed-abstracts-sentences-on-polyphenols-anticancer-activity's People

Contributors

ramongsilva avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.