Riveter 💪

Riveter 💪 is a Python package that measures social dynamics between personas mentioned in a collection of texts.

The package identifies and extracts the subjects, verbs, and direct objects in texts; it performs coreference resolution on the personas mentioned in the texts (e.g., clustering "Elizabeth Bennet" and "she" together as one persona); and it measures social dynamics between the personas by referencing a given lexicon. The package currently includes lexica for Maarten Sap et al's connotation frames of power and agency and Rashkin et al's connotation frames of perspective, effect, value, and mental state, but you can also load your own custom lexicon.

The name Riveter is inspired by "Rosie the Riveter," the allegorical figure who came to represent American women working in factories and at other industrial jobs during World War II. Rosie the Riveter has become an iconic symbol of power and shifting gender roles — subjects that the Riveter package aims to help users measure and explore.

Demo video and notebook

Watch our two minute demo video here: link

Check out our demo notebook here: link

Installation

Quick start

To skip local installation and get started immediately, you can using this Google Colab notebook.

Requirements

Python 3.9
numpy
pandas
seaborn
matplotlib
spacy-experimental

Installation instructions

These instructions have been tested on OSX machines. We have not tested these instructions in other environments.

We strongly recommend creating a new virtual environment. Activate this environment before installing and before running the code.

conda create -n riveterEnv python=3.9
conda activate riveterEnv

Download this repo by using the Git command below or by downloading the repository manually (click the green Code button above, select Download ZIP, and then unzip the downloaded directory).

git clone https://github.com/maartensap/riveter-nlp.git
cd riveter-nlp

Note: If installing on a Mac, you will need Xcode installed to run git from the command line.

Install spacy-experimental and the spaCy model files.

pip install -U spacy-experimental
pip install https://github.com/explosion/spacy-experimental/releases/download/v0.6.0/en_coreference_web_trf-3.4.0a0-py3-none-any.whl#egg=en_coreference_web_trf
python -m spacy download en_core_web_sm

Install pandas and seaborn.

conda install pandas
conda install seaborn

Usage

To use Riveter 💪, see the examples in our demo notebook.

This notebook includes both toy and realistic examples and all of the most important function calls.

If you want a quick start:

riveter = Riveter()  
riveter.load_sap_lexicon('power')
riveter.train(texts,
             text_ids)
persona_score_dict = riveter.get_score_totals()

Note: Here are some instructions for how to run demo.ipynb from the riveterEnv conda environment that you created during installation.

Documentation

`get_score_totals(frequency_threshold=0)`

Get the final scores for all the entities, above some frequency threshold across the dataset.

Name	Type	Description
`frequency_threshold`	integer	Optional: Entities must be matched to at least this many verbs to appear in the output.
RETURNS	dictionary	Dictionary of entities and their total scores.

`plot_scores(number_of_scores=10, title="Personas by Score", frequency_threshold=0)`

Create a bar plot showing the final scores across the dataset.

Name	Type	Description
`number_of_scores`	integer	Optional: Show only the top or bottom number of scores.
`title`	string	Optional: Plot title.
`frequency_threshold`	integer	Optional: Entities must be matched to at least this many verbs to appear in the output.

`get_scores_for_doc(doc_id, frequency_threshold=0)`

Get the final scores for all the entities, above some frequency threshold in a single document.

Name	Type	Description
`doc_id`	string or integer	Show results for this document ID.
`frequency_threshold`	integer	Optional: Entities must be matched to at least this many verbs to appear in the output.
RETURNS	dictionary	Nested dictionary of document IDs, entities, and their total scores.

`plot_scores_for_doc(doc_id, number_of_scores=10, title="Personas by Score", frequency_threshold=0)`

Create a bar plot showing the final scores for a single document.

Name	Type	Description
`doc_id`	string or integer	Show results for this document ID.
`number_of_scores`	integer	Optional: Show only the top or bottom number of scores.
`title`	string	Optional: Plot title.
`frequency_threshold`	integer	Optional: Entities must be matched to at least this many verbs to appear in the output.

`get_persona_polarity_verb_count_dict()`

Gets all the verbs, their frequencies, and whether they contributed positively or negatively to the final scores for every entity. Computed across the whole dataset.

Name	Type	Description
RETURNS	dictionary	Nested dictionary of entities, positive or negative contribution, verbs, and counts.

`plot_verbs_for_persona(persona, figsize=None, output_path=None)`

Create a heatmap showing the verb counts for a single persona.

Name	Type	Description
`persona`	string	The entity whose results will be shown in the plot.
`figsize`	tuple	Optional: Figure dimensions, e.g. (2, 4).
`output_path`	string	Optional: Where to save the plot as a file.

`get_persona_counts()`

Get the total counts for the entities (all entity matches, whether or not they were matched to a lexicon verb).

Name	Type	Description
RETURNS	dictionary	Dictionary of entities and integer counts.

`count_personas_for_doc(doc_id)`

Get the entity counts for a single document.

Name	Type	Description
doc_id	string or integer	Show results for this document ID
RETURNS	dictionary	Dictionary of entities and integer counts.

`count_scored_verbs_for_doc(doc_id)`

Get the verb counts (verbs that were matched to the lexicon) for a single document.

Name	Type	Description
doc_id	string or integer	Show results for this document ID
RETURNS	dictionary	Dictionary of verbs and integer counts.

`count_nsubj_for_doc(doc_id, matched_only=False)`

Get the noun subject counts for a single document.

Name	Type	Description
doc_id	string or integer	Show results for this document ID
matched_only	boolean	If true, return only the subjects that were matched to identified entities.
RETURNS	dictionary	Dictionary of noun subjects and integer counts.

`count_dobj_for_doc(doc_id,matched_only=False)`

Get the direct object counts for a single document.

Name	Type	Description
doc_id	string or integer	Show results for this document ID
matched_only	boolean	If true, return only the direct objects that were matched to identified entities.
RETURNS	dictionary	Dictionary of direct object and integer counts.

`get_persona_cluster(persona)`

Get the full entity cluster from neuralcoref.

Name	Type	Description
persona	string	Show results for this entity.
RETURNS	dictionary	Dictionary of the main entity string and all of its string matches.

`load_sap_lexicon(dimension='power')`

Load the verb lexicon from Sap et al., 2017.

Name	Type	Description
dimension	string	Select the lexicon: "power" or "agency".

`load_rashkin_lexicon(dimension='effect')`

Load the verb lexicon from Rashkin et al., 2016.

Name	Type	Description
dimension	string	Select the lexicon: ["effect", "state", "value", "writer_perspective", "reader_perspective", "agent_theme_perspective", "theme_agent_perspective"].

`load_custom_lexicon(lexicon_path, verb_column, agent_column, theme_column)`

Load your own verb lexicon.

Name	Type	Description
lexicon_path	string	Path the lexicon; this should be a TSV file.
verb_column	string	Column in the TSV that contains the verb. This should be in the same form as the Rashkin lexicon, e.g. "have" "take".
agent_column	string	Column containing the agent score (positive or negative number).
theme_column	string	Column containing the theme score (positive or negative number).

`get_documents_for_verb(target_verb)`

Find all the documents matched to the verb.

Name	Type	Description
target_verb	string	The verb you'd like to match.
RETURNS	(list, list)	List of matched document IDs, list of matched document texts.

`get_documents_for_persona(target_persona)`

Find all the documents matched to the persona.

Name	Type	Description
target_persona	string	The persona you'd like to match.
RETURNS	(list, list)	List of matched document IDs, list of matched document texts.

Authorship and Citation

This package was created by an interdiscplinary team including Maria Antoniak, Anjalie Field, Jimin Mun, Melanie Walsh, Lauren F. Klein, and Maarten Sap. You can find our paper writeup at the following URL: http://maartensap.com/pdfs/antoniak2023riveter.pdf

Use the following BibTex to cite the paper:

@article{antoniak2023riveter,
  title={Riveter: Measuring Power and Social Dynamics Between Entities},
  author={Antoniak, Maria and Field, Anjalie and Mun, Ji Min and Walsh, Melanie and Klein, Lauren F. and Sap, Maarten},
  year={2023},
  url={http://maartensap.com/pdfs/antoniak2023riveter.pdf}
}

ailabteam / riveter-nlp Goto Github PK

riveter-nlp's Introduction

Riveter 💪

Demo video and notebook

Installation

Quick start

Requirements

Installation instructions

Usage

Documentation

get_score_totals(frequency_threshold=0)

plot_scores(number_of_scores=10, title="Personas by Score", frequency_threshold=0)

get_scores_for_doc(doc_id, frequency_threshold=0)

plot_scores_for_doc(doc_id, number_of_scores=10, title="Personas by Score", frequency_threshold=0)

get_persona_polarity_verb_count_dict()

plot_verbs_for_persona(persona, figsize=None, output_path=None)

get_persona_counts()

count_personas_for_doc(doc_id)

count_scored_verbs_for_doc(doc_id)

count_nsubj_for_doc(doc_id, matched_only=False)

count_dobj_for_doc(doc_id,matched_only=False)

get_persona_cluster(persona)

load_sap_lexicon(dimension='power')

load_rashkin_lexicon(dimension='effect')

load_custom_lexicon(lexicon_path, verb_column, agent_column, theme_column)

get_documents_for_verb(target_verb)

get_documents_for_persona(target_persona)