Coder Social home page Coder Social logo

sib_paper's Introduction

Reproducing Underspecification in Language Modeling Tasks: A Causality-Informed Study of Gendered Pronoun Resolution

Now accepted to AAAI 2024 Main track. Please see https://github.com/2dot71mily/uspec for most up-to-date version of this paper.

Follow steps below to reproduce all data and plots.

Interact with open source demos instead!

If you would rather checkout currently running and more flexible demos, you can also reproduce the methods in this paper by checking out the open-source demos below, otherwise, skip to the Setup:

Setup

git clone https://github.com/2dot71mily/sib_paper.git
cd sib_paper
python3 -m venv venv_sib
source venv_sib/bin/activate
python3 -m pip install --upgrade pip
# If without a GPU comment out `nvidia.*` and `triton` from requirements.tct
# BERT-like and OpenAI models can be tested without GPU
pip install -r requirements.txt

Reproducing plots with existing data

Method 1

In config.py, set METHOD_IDX:

######################## Select method type #################################
METHODS = ['correlation_measurement', "specification_metric"] 
METHOD_IDX = 0  # Select '0' for Method 1, '1' for for Method 2

Then run in terminal: python spurious_plotting.py

Method 2

In config.py, set METHOD_IDX & if WINO_G_ID (Winogender Simplified or not):

######################## Select method type #################################
METHODS = ['correlation_measurement', "specification_metric"] 
METHOD_IDX = 1  # Select '0' for Method 1, '1' for for Method 2

...
# Set to `True` if testing with `Simplified` Winogender-gender-identified eval set
WINO_G_ID = False  

Then run in terminal: python uncertainty_plotting.py Note: this process is a bit slow, due to the large number of files ingested.

Reproducing the data used in the plots

First test the setup as shown below.

Testing your setup

Note: The LLM weights will be downloaded / cached from Hugging Face. Throughout, it is expected that Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']

Method 1

In config.py, set TESTING, METHOD_IDX, INFERENCE_TYPE_IDX & MODEL_IDX:

TESTING = True  # Set to True if testing on subsets of the challenge sets
######################## Select method type #################################
METHODS = ['correlation_measurement', "specification_metric"]  
METHOD_IDX = 0  # Select '0' for Method 1, '1' for for Method 2
...
########################## Select inference type ############################
INFERENCE_TYPES = ["bert_like", "open_ai", "cond_gen"]
INFERENCE_TYPE_IDX = 2
# Select '0' for BERT-like, '1' for `open-ai`, 2 for `UL2-family`
...
###################### Select model type #######################
...
CONDITIONAL_GEN_MODELS = ["UL2-20B-denoiser", "UL2-20B-gen", "Flan-UL2-20B"]
MODEL_IDX = 0
##################### If model is GPT ##################
# Required OpenAI API key only for `open_ai` type inference
OPENAI_API_KEY =  '<OpenAI API key>'

Then run in terminal: python inference.py

Method 2

In config.py, set TESTING, METHOD_IDX, if WINO_G_ID (Winogender Simplified or not), INFERENCE_TYPE_IDX & MODEL_IDX:

TESTING = True  # Set to True if testing on subsets of the challenge sets
######################## Select method type #################################
METHODS = ['correlation_measurement', "specification_metric"]  
METHOD_IDX = 1  # Select '0' for Method 1, '1' for for Method 2
...
########################## Select inference type ############################
INFERENCE_TYPES = ["bert_like", "open_ai", "cond_gen"]
INFERENCE_TYPE_IDX = 2
# Select '0' for BERT-like, '1' for `open-ai`, 2 for `UL2-family`
...
###################### Select model type #######################
...
CONDITIONAL_GEN_MODELS = ["UL2-20B-denoiser", "UL2-20B-gen", "Flan-UL2-20B"]
MODEL_IDX = 0
##################### If model is GPT ##################
# Required OpenAI API key only for `open_ai` type inference
OPENAI_API_KEY =  '<OpenAI API key>'

Then run in terminal: python inference.py

Reproducing all the data

Method 1

In config.py, same as Method 1 above but set:

TESTING = False 

Then run in terminal: python inference.py

Method 2

In config.py, same as Method 2 above but set:

TESTING = False 

Then run in terminal: python inference.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.