Coder Social home page Coder Social logo

phosseini / gispy Goto Github PK

View Code? Open in Web Editor NEW
10.0 1.0 2.0 9.86 MB

GisPy: A Tool for Measuring Gist Inference Score in Text https://aclanthology.org/2022.wnu-1.5/

Python 1.79% Jupyter Notebook 1.00% Assembly 97.22%
natural-language-processing natural-language-understanding coherence gist psycholinguistics fuzzy-trace-theory

gispy's Introduction

GisPy: A Tool for Measuring Gist Inference Score in Text

What is Gist? Based on Fuzzy-trace theory (FTT), when individuals read a piece of text, there are two mental representations encoded in parallel in their mind including 1) gist and 2) verbatim. While verbatim is related to surface-level information in the text, gist represents the bottom-line meaning and underlying semantics of it.

Inspired by the definition of Gist Inference Score (GIS) by Wolfe et al. (2019) and implementation of coherence/cohesion indices in Coh-Metrix, we developed GisPy, a tool for measuring GIS in text.

How to run GisPy

  1. Install the requirements: pip install -r requirements.txt
    • We suggest you create a new virtual environment (e.g., a conda enviroment).
    • If you only want to run GisPy and don't need to run jupyter notebooks, you can skip installing the following packages:
      • matplotlib, textract, wayback
  2. Install the spaCy model: python -m spacy download en_core_web_trf
  3. Put all text documents separately as .txt files (one document per file) in the /data/documents folder.
    • Paragraphs in each document need to be spearated by [at least] one new line character (\n).
  4. Run /gispy/run.py class: python run.py [OUTPUT_FILE_NAME]
    • OUTPUT_FILE_NAME: name of the output file in .csv format where results will be saved.
  5. The output file contains the following information:
    • GIS score for each document in a column named gis
    • Indices and the z-scores of indices

ℹ️ Important

GIS will be computed based on the indices listed in gis_config.json file. This file is a dictionary of indices with their associated weights to give you maximum flexibility about how to use GisPy indices when computing the GIS scores. You can pick any of the indices from the following table (List of GisPy indices). By default in the config file, we have listed the indices that are used in the original GIS formula. Format of the config file is like the following:

{
  "index_1": weight of index_1,
  ...
  "index_n": weight of index_n
}

An example:

{
  "PCREF_ap": 1,
  "PCDC": 1,
  "SMCAUSe_1p": 1,
  "SMCAUSwn_a_binary": -1,
  "PCCNC_megahr": -1,
  "WRDIMGc_megahr": -1,
  "WRDHYPnv": -1
}

weight is a real number that will be multiplied by the mean of index values when we linearly combine the index values in the GIS formula. If you want to ignore an index, you can either not include it in the dictionary at all, or you can simply set its weight to 0.

List of GisPy indices

In the following, there is a list of all indices generated by/in GisPy. To make it easier to map these indices with Coh-Metrix indices, we mainly followed Coh-Metrix indices’ names with some minor modifications (e.g., using different postfixes to show the exact implementation method for each index if there are multiple implementations).

Index Implementations
Number of Paragraphs DESPC
Number of Sentences DESSC
Referential Cohesion CoREF, PCREF_1, PCREF_a, PCREF_1p, PCREF_ap
Deep Cohesion PCDC
Semantic Verb Overlap SMCAUSe_1, SMCAUSe_a, SMCAUSe_1p, SMCAUSe_ap
WordNet Verb Overlap SMCAUSwn_1p_path, SMCAUSwn_1p_lch, SMCAUSwn_1p_wup, SMCAUSwn_1p_binary, SMCAUSwn_ap_path, SMCAUSwn_ap_lch, SMCAUSwn_ap_wup, SMCAUSwn_ap_binary, SMCAUSwn_1_path, SMCAUSwn_1_lch, SMCAUSwn_1_wup, SMCAUSwn_1_binary, SMCAUSwn_a_path, SMCAUSwn_a_lch, SMCAUSwn_a_wup, SMCAUSwn_a_binary
Word Concreteness PCCNC_megahr, PCCNC_mrc
Imageability WRDIMGc_megahr, WRDIMGc_mrc
Hypernymy Nouns & Verb WRDHYPnv

List of files

Gist Inference Score (GIS) formula

GIS = Referential Cohesion 
      + Deep Cohesion 
      + (LSA Verb Overlap - WordNet Verb Overlap) 
      - Word Concreteness 
      - Imageability 
      - Hypernymy Nouns & Verbs

Citation

@inproceedings{hosseini-etal-2022-gispy,
    title = "{G}is{P}y: A Tool for Measuring Gist Inference Score in Text",
    author = "Hosseini, Pedram  and
      Wolfe, Christopher  and
      Diab, Mona  and
      Broniatowski, David",
    booktitle = "Proceedings of the 4th Workshop of Narrative Understanding (WNU2022)",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.wnu-1.5",
    doi = "10.18653/v1/2022.wnu-1.5",
    pages = "38--46",
    abstract = "Decision making theories such as Fuzzy-Trace Theory (FTT) suggest that individuals tend to rely on gist, or bottom-line meaning, in the text when making decisions. In this work, we delineate the process of developing GisPy, an opensource tool in Python for measuring the Gist Inference Score (GIS) in text. Evaluation of GisPy on documents in three benchmarks from the news and scientific text domains demonstrates that scores generated by our tool significantly distinguish low vs. high gist documents. Our tool is publicly available to use at: https: //github.com/phosseini/GisPy.",
}

gispy's People

Contributors

panaceai avatar phosseini avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.