Coder Social home page Coder Social logo

zeta1999 / deeprank Goto Github PK

View Code? Open in Web Editor NEW

This project forked from vcvpaiva/deeprank

0.0 0.0 1.0 37.71 MB

A first cut into exploring the use of dependency links for building Text Graphs, that, among other things, with help of a centrality algorithm like *PageRank*, can extract relevant keywords and summaries from text documents.

License: Apache License 2.0

Python 82.96% Shell 0.66% Prolog 16.38%

deeprank's Introduction

** The system uses dependency links for building Text Graphs, that with help of a centrality algorithm like PageRank, extract relevant keyphrases, summaries and relations from text documents. A SWI-Prolog based module adds an interactive shell for talking about the document with a dialog agent that extracts for each query the most relevant sentences covering the document. Spoken dialog is also available if the OS supports it. Developed with Python 3, on OS X, but also working on Linux.**

DeepRank is based on two packages.

TextGraphCrafts

Python-based summary, keyphrase and relation extractor from text documents using dependency graphs.

HOME: https://github.com/ptarau/TextGraphCrafts

Project Description

** The system uses dependency links for building Text Graphs, that with help of a centrality algorithm like PageRank, extract relevant keyphrases, summaries and relations from text documents. Developed with Python 3, on OS X, but portable to Linux.**

Dependencies:

  • python 3.7 or newer, pip3, java 9.x or newer. Also, having git installed is recommended for easy updates
  • pip3 install nltk
  • also, run in python3 something like
import nltk
nltk.download('wordnet')
nltk.download('words')
nltk.download('stopwords')
  • or, if that fails on a Mac, use run python3 down.py to collect the desired nltk resource files.
  • pip3 install networkx
  • pip3 install requests
  • pip3 install graphviz, also ensure .gv files can be viewed
  • pip3 install stanfordnlp parser
  • Note that stanfordnlp requires torch binaries which are easier to instal with ````anaconda```.

Tested with the above on a Mac, with macOS Mojave and Catalina and on Ubuntu Linux 18.x.

Running it:

in a shell window, run

start_server.sh

in another shell window, start with

python3 -i tests.py

and then interactively, at the ">>>" prompt, try

>>> test1()
>>> test2()
>>> ...
>>> test9()
>>> test12()
>>> test0()

see how to activate other outputs in file

deepRank.py

text file inputs (including the US Constitution const.txt) are in the folder

examples/

Handling PDF documents

The easiest way to do this is to install pdftotext, which is part of Poppler tools.

If pdftotext is installed, you can place a file like textrank.pdf already in subdirectory pdfs/ and try something similar to:

Change setting in file params.py to use the system with other global parameter settings.

Alternative NLP toolkit

Optionally, you can activate the alternative Stanford CoreNLP toolkit as follows:

  • install Stanford CoreNLP and unzip in a derictory of your choice (ag., the local directory)
  • edit if needed start_parser.sh with the location of the parser directory
  • override the params class and set corenlp=True

Note however that the Stanford CoreNLP is GPL-licensed, which can place restrictions on proprietary software activating this option.

Project Description

** The system uses package text_graph_crafts based on dependency links for building Text Graphs, that with help of a centrality algorithm like PageRank, extract relevant keyphrases, summaries and relations from text documents.

A SWI-Prolog based module adds an interactive shell for talking about the document with a dialog agent that extracts for each query the most relevant sentences covering the document. Spoken dialog is also available if the OS supports it. Developed with Python 3, on OS X, but portable to Linux.**

Dependencies:

  • python 3.7 or newer, pip3, java 9.x or newer, SWI-Prolog 8.x or newer, graphviz
  • also, having git installed is recommended for easy updates
  • pip3 install text_graph_crafts

see how to activate other outputs in file

https://github.com/ptarau/TextGraphCrafts/blob/master/text_graph_crafts/deepRank.py

The second is activated with

python3 -i qpro.py

or the shorthand script qgo.

It requires SWI-Prolog to be installed and available in the path as the executable swipl and the Python to Prolog interface pyswip, to be installed with

pip3 install pyswip

It activates a Prolog process to which Python sends interactively queries about a selected document. Answers are computed by Prolog and then, if the parameter quiet is off, spoken using the say OS-level facility (available on OS X and Linux machines.

Prolog relation files, generated on the Python side are associated to each document as well as the queries about it. They are stored in the same directory as the document.

Try

>>> t1() 
...
>>> t9()
>>> t0()

or

>>> chat('const')

to interactively chat about the US Constitution. The same
for other documents in the examples folder.

### Handling PDF documents

The easiest way to do this is to install *pdftotext*, which is part of [Poppler tools](https://poppler.freedesktop.org/).

If pdftotext is installed, you can place a file like *textrank.pdf*
already in subdirectory pdfs/ and try something similar to:

pdf_chat('textrank')

which activates a dialog about the TextRank paper. Also

pdf_chat('logrank')

activates a dialog about *pdfs/logrank.pdf*, which describes
the architecture of the current system.

Change setting in file params.py to use the system with
other global parameter settings.


deeprank's People

Contributors

ptarau avatar

Forkers

metavai

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.