Coder Social home page Coder Social logo

semantic_uncertainty's Introduction

Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation

image

Overview

This repository contains the code used in Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation (arXiv)

run_pipeline.sh is a slurm batch script that executes all steps of our pipeline. sbatch run_pipeline.sh submits the batch script.

Preprocessing & Config

parse_triviaqa.py and parse_coqa.py load TriviaQA and CoQA from HuggingFace, tokenize it and store the data sets. These scripts only have to be run once.

You'll also have to set the paths where you would like to store intermediate and final results of the pipeline in config.py.

The environment.yml lists the dependencies of the conda environment we used for our experiments.

Generating answers and computing uncertainty measures

The components of our pipeline are:

  • generate.py generates a number of answers for a subset of questions of a given data set. This step also evaluates the question-answering accuracy of the generated answers.
  • clean_generations.py post-processes the generations from the first step, mainly by removing any unwanted trailing text, e.g. in cases where the model first gives the answer to the given question and then generates an additional question.
  • get_semantic_similarities.py identifies semantic clusters in the generated set of answers from the previous step.
  • get_prompting_based_uncertainty.py computes the p(True) baseline.
  • compute_likelihoods.py computes the likelihoods of the generated answers under the generating model.
  • compute_confidence_measure.py computes a range of different conficence/uncertainty measures such as the semantice entropy predictive entropy, lexical similarity, and p(True).

Analyzing results

After running the pipeline, use analyze_result.py to compute performance metrics, such as the AUROC.

Hardware requirements

Most model runs should run with at most 40GB of GPU memory. An exception are the experiments on OPT-30B which we run on two 80GB A100s.

Dependencies

Our implemenetation uses PyTorch and HuggingFace. We use wandb to track our runs. environment

semantic_uncertainty's People

Contributors

lorenzkuhn avatar caleb-kaiser avatar lorenz-openai avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.