Coder Social home page Coder Social logo

mevengue / kokomind Goto Github PK

View Code? Open in Web Editor NEW

This project forked from chats-lab/kokomind

0.0 0.0 0.0 222.85 MB

KokoMind: Can LLMs Understand Social Interactions?

Home Page: https://chats-lab.github.io/KokoMind/

License: Apache License 2.0

JavaScript 65.24% Python 9.42% CSS 4.84% HTML 20.49%

kokomind's Introduction

KokoMind

License Python 3.9+

This is the repo for KokoMind, a dataset with multi-party social interactions to evaluate LLMs' social understanding abilities. The repo contains:


Logo of KokoMind.

News

Demo

demo2.mp4

Dataset

KokoMind contains 150 complex multi-party social interactions (50 per source) with free-text questions and answers. To ensure diversity and scalability and avoid data contamination, all the social interactions, questions, and answers are generated by GPT-4 and verified by human experts later. These generations are based on three different sources:

  • ๐Ÿค– GPT-4-only: This subset is created solely by GPT-4 through prompting, without grounding on existing sources.
  • ๐ŸŽฆ Movie-based: To avoid data contamination, this portion of the data is grounded on diverse scenarios pulled from movies released after 2022. GPT-4 shapes these situations, maintaining the core essence while adding its own elements.
  • ๐Ÿง  ToMi-based: This segment contains data backboned by a simulated dataset, ToMi, which involves moving physical objects to different places, a classic test for theory of mind. These social interactions are again embellished and expanded by GPT-4.

For each social interaction, we ask various questions designed to probe the following aspects of social understanding.

  • ๐Ÿง  Theory of Mind: Questions evaluating understanding of others' mental states and perspectives.
  • ๐Ÿ‘ Social Norm: Questions aiming to discern societal values and norms within the situations.
  • ๐Ÿ˜ƒ Emotion Recognition: Questions targeted at identifying and understanding emotional elements within the context.
  • ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘ง Social Relation: Queries focusing on interpersonal dynamics and relationships.
  • ๐Ÿค” Counterfactual Questions: Hypothetical queries designed to explore alternative outcomes or possibilities.
  • ๐Ÿ“ Social Advice: Questions eliciting advice or action recommendations relevant to the given situation.

question_nonverbal_yes_v0.1.json contains 770 samples in total. This JSON Lines file is a list of dictionaries, with each dictionary contains the following fields:

  • question_id: int, the unique ID of the question.
  • text: str, social interaction context and question.
  • answer: str, GPT-4 answer that has been further verified by human.
  • source: str, one of the three data sources: gpt-4, movie, tomi.
  • category: str, one of six question categories: ToM, Social Norm, Emotion Recognition, Social Relation, Counterfactual, Social Advice.

question_nonverbal_no_v0.1.json contains the same social interactions and questions but but with the non-verbal cues in the parenthesis (e.g., nervously sipping coffee, etc) removed from the context.

Evaluation

Pre-requisite

pip install -r requirements.txt
export OPENAI_API_KEY=<your_api_key>
export ANTHROPIC_API_KEY=<your_api_key>

Generate model answers

# Generate local model anwers
# Use vicuna-7b as an example
python eval/get_model_answer.py --model-path ${PATH_TO_LOCAL_HF_MODEL} --model-id vicuna-7b --question-file data/question_nonverbal_yes_v0.1.jsonl --answer-file data/answer/answer_vicuna-7b.jsonl --num-gpus 8

# GPT-3 answer (reference model by alpaca-eval)
python eval/qa_baseline_gpt3.py -q data/question_nonverbal_yes_v0.1.jsonl -o data/answer/answer_gpt3.jsonl

# GPT-3.5 answer
python eval/qa_baseline_gpt35.py -q data/question_nonverbal_yes_v0.1.jsonl -o data/answer/answer_gpt35.jsonl

# GPT-4.0 answer
python eval/qa_baseline_gpt4.py -q data/question_nonverbal_yes_v0.1.jsonl -o data/answer/answer_gpt4.jsonl

# Claude answer
python eval/qa_baseline_claude.py -q data/question_nonverbal_yes_v0.1.jsonl -o data/answer/answer_claude.jsonl

Run evaluation

Our evaluation is based on Alpaca-Eval.

# Convert to alpaca_eval input format
python eval/generate_alpaca_eval.py -q data/question_nonverbal_yes_v0.1.jsonl -a data/answer/answer_gpt3.jsonl -o data/alpaca_eval/answer_gpt3.json

alpaca_eval make_leaderboard --leaderboard_path data/alpaca_results/leaderboard.csv --all_model_outputs "./data/alpaca_eval/answer_*" --reference_outputs data/alpaca_eval/answer_gpt3.json --is_overwrite_leaderboard True

License

This project is an early-stage research showcase, designed solely for non-commercial purposes. It adheres to OpenAI's data usage terms, and ShareGPT's privacy practices. Let us know if you spot any potential violations. The software's code is available under the Apache License 2.0.

Acknowledgement

We would like to thank Yejin Choi from UW, Louis-Philippe Morency from CMU, Jason Weston from Meta, and Diyi Yang from Stanford for their enlightening dialogues and constructive inputs.

Citation

Please cite our work if you find it useful.

@misc{Shi_KokoMind_Can_Large_2023,
  author = {Shi, Weiyan and Qiu, Liang and Xu, Dehong and Sui, Pengwei and Lu, Pan and Yu, Zhou},
  title = {{KokoMind: Can Large Language Models Understand Social Interactions?}},
  month = jul,
  year = {2023},
  url = {https://chats-lab.github.io/KokoMind/}
}

kokomind's People

Contributors

liang-qiu avatar lupantech avatar wyshi avatar psui3905 avatar dehongxu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.