Coder Social home page Coder Social logo

fairsumm's Introduction

FairSumm: A fair text summarization algorithm

Code and Dataset used in the paper titled, Summarizing User-generated Textual Content: Motivation and Methods for Fairness in Algorithmic Summaries at 2019 ACM Conference on Computer-Supported Cooperative Work and Social Computing (ACM CSCW).

If you are using this code or dataset for any research publication, please cite the following paper as the source of the code and dataset.

Abhisek Dash, Anurag Shandilya, Arindam Biswas, Kripabandhu Ghosh, Saptarshi Ghosh, and Abhijnan Chakraborty. "Summarizing User-generated Textual Content: Motivation and Methods for Fairness in Algorithmic Summaries”. Proceedings of the ACM on Human-Computer Interaction, ACM, vol. 3, No. CSCW, Article 172, November 2019.

BibTex:

@article{dash2019summarizing,
title={Summarizing User-generated Textual Content: Motivation and Methods for Fairness in Algorithmic Summaries},
author={Dash, Abhisek and Shandilya, Anurag and Biswas, Arindam and Ghosh, Kripabandhu and Ghosh, Saptarshi and Chakraborty, Abhijnan},
journal={Proceedings of the ACM on Human-Computer Interaction},
volume={3},
number={CSCW},
pages={172},
year={2019},
publisher={ACM}
}

Prerequisites

-JDk 1.7 or greater
 
-Python
 -nltk
 -pandas
 -numpy
 -scipy

Basic Usage

Example

To run FairSumm on Claritin dataset with equal representation fairness notion for a summary of 50 tweets, execute the following command from the project home directory:
python FairSumm.py --file Claritin.txt

Options

You can check out the other options available to use with FairSumm using:
python FairSumm.py --help

Datasets

We use three tweet datasets [can be found in the dataset folder] related to (a) Claritin drug side-effects, (b) MeToo movement and (c) US-presidential election to generate fair summaries. (Details can be found in the paper)

Input

The supported input text file format is as following:

-Input file for FairSumm.py (Default settings is for equal representation fairness notion.) 
-You need to change it as per your requirements by giving the desired number of tweets from each classes.

 -input<||>input dataset
 -length<||>length of the output summary
 -num_groups<||>number of socially salient groups in the dataset
 -group1<||>required number tweets in the final summary
 -group2<||>required number tweets in the final summary

-Tweets to summarize
 -tweetId<||>tweetLabel<||>tweetText
 
-Similarity between tweets
 -.csv file with similarity scores between tweets (To be generated)

Claritin.txt, METOO.txt and US-Election.txt are sample input files for generating summaries of length 50 tweets that follow the equal represenation fairness notion, from the three datasets respectively.

Output

The obtained summary of specified number of tweets for the dataset will get stored in the Summaries folder in the name of the input dataset. (Will get created)

If you set the evaluation variable to 1 then Rouge scores will be evaluated and stored as described below:
Rouge 1 and Rouge 2 Recall and F-scores will be stored in an additional file- Final_Output.txt (will get created) in the following order (separated by tabs) in the parent directory:

 -SummaryName	Rouge-1 Recall	Rouge-1 F-Score	Rouge-2 Recall	Rouge2- F-Score

In case you need to add more human generated summary you can add them in the Test_Summaries folder of the corresponding dataset.

Miscellaneous

Please send any questions you might have about the code and/or the algorithm to [email protected].

Note: This is only a reference implementation of the FairSumm algorithm and could benefit from several performance enhancement schemes, some of which are discussed in the paper.

fairsumm's People

Contributors

ad93 avatar

Stargazers

Shichao Sun avatar Robin Forsberg avatar  avatar Ankita Das avatar bistaumanga avatar

Watchers

 avatar

fairsumm's Issues

Dataset is incomplete

Hi
I am not able to run the code because a few of the files are missing in the Dataset like - 'cosinescores.zip'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.